State Space Methods in gretl

gretl is a general-purpose econometric package, whose most important characteristic is being free software . This ensures that its source code is freely available under the general public license (GPL) and, like most GPL software, that it can be used free of charge. As of version 1.8.1 (released in May 2009), it oﬀers a mechanism for handling linear state space models in a reasonably general and eﬃcient way. This article illustrates its main features with two examples.

1. Introduction 1.1. gretl and the free software movement gretl (Cottrell and Lucchetti 2011) is an econometric package which aims to implement the widest possible array of statistical procedures. One of its main characteristics is that gretl is free software, in the sense that it is released under the general public license (GPL), like more famous software projects such as the Linux kernel, other statistical software such as R, or the GNU project. In fact, gretl is part of the GNU project; for details on the GPL and the GNU project, see Yalta and Lucchetti (2008) or Deek and McHugh (2008).
As most GPL-licensed software, gretl is available free of charge (it can be downloaded from http://gretl.sourceforge.net/), which explains at least in part its growing popularity (see Lucchetti 2009 for some statistics). Allin Cottrell, the founder and leader of the project, provides a history of gretl and its future prospects in Cottrell (2009).
To the end user, it may seem that the main consequence of gretl being free software is that its price is zero; however, there are many more aspects to take into consideration. For example, free software projects can (and, as a rule, do) benefit from one another in terms of code re-use: parts of gretl's source code were taken and adapted from several other free projects, such as R (R Development Core Team 2011) and Gnumeric (The Gnumeric Team 2010) and in the same way other free projects (R, for one) incorporate some of gretl's code. gretl uses a number of free libraries which spare the development team from re-implementing some algorithms: for example, gretl employs the FFTW library (itself a GPL-licensed projectsee Frigo and Johnson 2005) for the discrete Fourier transform, which ensures not only that the algorithms are among the best available, but also that any future improvement in FFTW will be automatically passed to gretl.
Apart from code reuse, another common trait which gretl shares with the rest of the free software world is the extreme friendliness and vivacity of the online community, which effectively acts as a 24/7 online helpdesk.

State-space modelling
Since its 1.8.1 version, which was released on 2009-05-21, gretl offers a mechanism for handling linear state space models in a reasonably general way. Given that gretl's friendly and intuitive graphical user interface (GUI) is commonly considered as one of its strongest points (see for example Smith and Mixon 2006), the reader may find it surprising that the only way to access the program's features discussed here is through scripting. In fact, creating a GUI which is intuitive yet general enough is not trivial and there has been some debate among the developers as to the best way to expose the state space modelling functionalities through the GUI, but a consensus has not been found. 1 As a consequence, in order to describe the facilities that gretl offers for working with state space models, it is best to think of gretl as a programming language. The main idea behind gretl's implementation of state space models is that the user can define a state space model which becomes "the" model within the context in which it is defined and all subsequently called functions. The kalman environment is used to set it up and two specialized functions, kfilter() and ksmooth(), 2 are used to perform the actual filtering. As an example, consider this very minimal script: nulldata 10 set seed 12345 y = normal() kalman obsy y obsymat 1 obsvar 1 statemat 1 statevar 1 end kalman series v kfilter(&v) print y v -o 1 In fact, the gretl development team would welcome suggestions. 2 An additional function, ksimul() is available for using state space models to generate simulated data, but will not be described here. See Cottrell and Lucchetti (2009) for details.
Leaving details aside for a moment, the structure of the script should be rather clear. The first three lines of code generate a white noise process 10 observations long. The next seven lines of code set up the state space model, which in this example is The last three lines of code perform a forward pass of the filter, store the prediction errors, and print them together with the original data. Running the code above produces 3 : If a more "structural" programming approach is needed, so that the algorithm is split into subprograms, it is important to note that a kalman environment is local to a function. So for example, suppose that the basic structure of a gretl script is as follows: When function ss1 is called (which fits state space model 1, say), the do_stuff function (which does the filtering, estimation, and so forth) will act on the state space model defined in that function, which may be completely different from the one specified in the function ss2 (which fits state space model 2, say). The implementation of the forwards and backwards filtering algorithm is completely written in C (like the rest of gretl), which ensures excellent numerical performance. 4 If parameter estimation is necessary, it can be performed via the mle command, which uses numerical methods to maximize a likelihood function. Hence, a highly idealized flowchart for implementing a statistical model amenable to a state space representation (such as (V)AR(I)MA, structural time series models and others) in gretl runs like this: 1. set up the model by defining its state space representation in a kalman block; 2. set up an mle block, in which the system matrices are filled with the unknown parameters and the likelihood, computed by running the forward pass filter via the kfilter() function, is maximized; 3. after estimation is done, the function ksmooth() can be used to perform fixed-interval smoothing and the quantities of interest, such as states, can be accessed by the user.
The two main ingredients, the kalman block and the mle block, can be briefly illustrated as follows.

The kalman block
A kalman block is a series of statements that describe the system matrices of the state space model. For example, the code snippet kalman ... obsymat foo ... end kalman would tell gretl that a matrix called foo should be used as the system matrix which premultiplies the states in the observation equation (Z t in the terminology adopted in Commandeur, Koopman, and Ooms 2011). The contents of the matrix foo can then be modified via ordinary gretl commands, without the need for re-declaring the whole model. Any element of the system matrices can therefore be assigned a value at any time in a very convenient and intuitive way, simply by modifying the corresponding matrix. Special syntax constructs are also available for specifying time-varying matrices. Table 1 translates gretl keywords into the notation adopted here. Note that gretl's syntax does not provide the equivalent of the R t matrix; however, this is not a serious limitations, since the statevar matrix needs not be invertible, so it is more accurate to say that the keyword statevar is used to specify R t Q t R t . 5

Keyword Symbol
Moreover, the additional keywords obsx and obsxmat can be used for introducing exogenous variables in the observation equation. Some defaults are available: for example, if the inistate keyword is omitted, a 1 is understood to be a zero vector (see Cottrell and Lucchetti 2009 for a complete list). 6 As of version 1.9.1, it is also possible to introduce an "intercept" into the state transition equation via the stconst keyword: that is, the term c t in models for which the state transition equation can be written as of course, this could have been accomplished without a special keyword by re-defining the state vector appropriately, so that it includes one or more zero-variance states, but this could be rather inefficient numerically.
Once the model is set up, the kfilter() function performs a forward pass and several quantities of interest become available: the one-step-ahead prediction errors v t with associated variances F t and also a series containing the individual contributions to the log-likelihood which may be accessed via the $kalman_llt keyword; their sum is available via the $lnl accessor.
In Section 2 we provide commented estimation of the local level model on the Nile dataset. Section 3 contains a more complex model.

The mle block
The mle command is gretl's generic procedure for maximum likelihood estimation. By default, 5 Special syntax is available for state space models in which ηt and εt are contemporaneously correlated, so to match the general notation used, for example, in Harvey and Proietti (2005). For reasons of space, it will not be illustrated here; the interested reader will find more details in the gretl's User Guide.
6 As the state space code is a relatively recent addition, the precise details of the syntax may change somewhat in the future, to accommodate new features or simplify its syntax. If they do, the gretl development team will make every effort to minimize the inconvenience of backward-incompatible changes.
it uses the quasi-Newton BFGS algorithm, although an option is available to use L-BFGS as an alternate algorithm (see Liu and Nocedal 1989). 7 Although the user can supply analytical derivatives, gretl can use a numerical approximation which is based on a 4-step Richardson extrapolation algorithm. This algorithm choice ensures that the numerically computed score is remarkably accurate for standard problems.
A miniature example of the usage of mle can be given as follows. Consider the estimation of an AR(1) model by conditional ML. The log-likelihood for one observation can be written as where ϕ() is the standard normal density function. This translates into the following code: sigma = 1 rho = 0 mle loglik = ln(dnorm(e/sigma)) -ln(sigma) e = y -rho * y(-1) params rho sigma end mle Once the parameters are initialized, the mle code block comprises an expression for the loglikelihood, an indication of which parameters have to be estimated, via the params keyword, and any intermediate computation (in this case calculation of the forecast errors, e).
The mle command offers the user three choices for the computation of the covariance matrix of the estimate: the outer product of gradients (the default), the inverse Hessian (numerical), or the robust "sandwich" estimator (see for example Gourieroux, Monfort, and Trognon 1984). 2. Case 1: The local level model applied to the Nile data gretl does not provide a built-in command for estimation of the local level model. However, such a model is reasonably simple to set up. In order to show a possible way to implement the local level model in gretl and use the Nile data as an example, we will first write a function performing the "core" task of estimating the model, and then call it from a "main" file.

The core model function
The local level model (described in more detail in the introductory article Commandeur et al.

of this volume) can be written as
where the only two parameters that must be estimated are the two variances σ 2 ε and σ 2 ξ .
In practice, a gretl script for estimating a local level model would have at its core a function more or less like the following (with comments interspersed): This is the standard way of defining a function in gretl: in this case, the function local_level returns a matrix holding the estimated variances and takes three arguments. The first one, y, contains a data series. The other two should contain pointers 8 to data series (as indicated by the * modifier) and are used to return the one-step-ahead prediction error v t and its variance F t , respectively. The null keyword indicates that they may be omitted, in which case the two series are discarded.
scalar ss1 ss2 scale sy = init_y(y, &ss1, &ss2, &scale) Here, the user-defined function init_y (not shown here, but available in the accompanying files) is used to compute a preliminary estimate of the variances and to rescale y so to facilitate numerical maximization of the log-likelihood. 9 kalman obsy sy obsymat 1 obsvar ss1 statemat 1 statevar ss2 end kalman --diffuse Here the state space representation is set up. The --diffuse option is used to indicate that a diffuse-prior algorithm will be used when filtering, setting P 1 = 10 7 · I. The gretl manual states: [Diffuse i]nitialization of the Kalman filter [. . . ] has been the subject of much discussion in the literature-see for example de Jong (1991); Koopman (1997). At present gretl does not implement any of the more elaborate proposals that have been made. 8 The usage of pointers in gretl is a much simplified application of the concept of pointers in the C programming language. gretl's concept of pointers is quite similar to the one found in the Ox programming language (see Doornik 2007). Usage of pointers is the standard way in gretl to have a function return more than a single object.
9 The algorithm used for initializing the two variances is based on the reduced form of the local level model: so that ∆yt is a MA(1) process; by a standard method-of-moments reasoning, consistent estimators of the two variances can be obtained from γ0 and γ1, the sample autocovariances of ∆yt, as follows: These two estimators can then be used as initial values.
However, the P 1 matrix can be set by the user via the inivar keyword, so more sophisticated alternatives are possible if necessary.
Note that the scalars ss1 and ss2 are specified as σ 2 ε and σ 2 ξ , respectively, via the obsvar and statevar. Assigning new values to these scalars would (and, in fact, will) modify the contents of the relevant system matrices in the state space representation.
Here ML estimation of the parameters is performed. Since gretl does not provide a constrained optimization routine, the actual parameters that are used for numerical maximization of the log-likelihood are the two natural logarithms of the variances, which in this example are contained in the vector theta. 10 Points to note: the construct loglik = ERR ? NA : misszero($kalman_llt) must be interpreted as follows 11 : the kfilter() returns a scalar (0 for no error, nonzero otherwise) and fills a series called $kalman_llt which contains the perobservation likelihood contributions. So, in case of error, the loglikelihood is set to missing, which would force mle to abort; otherwise, missing values in the loglikelihood contributions (because of missing values in y t ) are set to 0.
The scalars ss1 and ss2 get filled with exp(θ 1 ) and exp(θ 2 ). The optimization is performed in terms of the vector theta, as indicated by the params keyword.
The --hessian option instructs mle to compute the variance-covariance matrix ofθ from the numerical Hessian. matrix variances = exp(theta)' matrix ret = scale * variances series ll = $kalman_llt T = sum(ok(ll)) matrix V = $vcv .* (variances*variances') print_results (variances~V, scale, T, $lnl) 10 In fact, estimation may be based on the so-called "concentrated likelihood" (see Durbin and Koopman 2001, p. 31) to make estimation numerically more efficient; however, given the aim of this paper, I chose not to complicate the code.
11 Perhaps a few readers may be unfamiliar with the syntax y = A ? x0 : x1. This construct is used in several programming languages and sets y to x0 if A is nonzero and to x1 otherwise.
Here, the results obtained with ML estimation are re-transformed to the scale of the original y t variable and stored into the ret matrix, to make them available to the caller function; the variance-covariance matrix of the estimated parameters is computed via the delta method: The Jacobian is where indicates the Hadamard product (the corresponding gretl operator is .*) and the scalar T, the actual sample size, is computed as the number of non-missing observations via gretl's internal function ok(). The $vcv matrix is automatically generated after successful completion of the mle command. Then, the user-written function print_results formats the estimation output and prints it out.
matrix f_err f_var kfilter(&f_err, &f_var) series u = f_err * sqrt(scale) series v = f_var * scale return ret end function Finally, the forward pass filter is run once more to retrieve the one-step-ahead prediction error v t and its variance F t . These are rescaled, and stored into the two series u and v. As a last step, the estimates are returned.

Data input
The "main" script would look something like open nile.gdt series uhat Ft matrix Vars = local_level (nile, &uhat, &Ft) where first the dataset is opened, next the series for the prediction error and its variance are declared, and then parameter estimation is performed.

Estimation
After reading the data, the function local_level defined in Section 2.1 is called and the estimation performed, yielding the results shown in Table 2. The estimated parameters are coefficient   The ksmooth() function returns the smoothed states. If given a matrix-pointer to an argument, as shown, this is filled with the variance for the smoothed state. Figure 2 shows the original data and the estimated µ t series, together with a 90% confidence band.

Auxiliary residuals
At present, gretl lacks a native algorithm for performing disturbance smoothing, so an automatic mechanism for retrieving generalized residuals and their variances is not provided yet. However, they can be computed quite easily since all the necessary building blocks are available. The relevant formulae areε For the variance ofξ t , it is necessary to compute the quantity N t as defined in (Durbin and Koopman 2001, Equation 2.29), namely as a function of the variance of the prediction errors F t and the Kalman gain K t . These quantities can be obtained by running the kfilter() with pointer arguments. Again, this is not shown here for space reasons, but an example is available in the accompanying files. The code used for the computation and plotting of the auxiliary residuals is:

Forecasts
For the local level model, forecasts can be obtained quite simply by appending missing observations at the end of the sample and re-running the smoother. Since gretl automatically Here the user function loclev_sm (described above) was used again to extract the smoothed states from the kalman structure. The confint function (provided in the accompanying files as well) is mainly a convenience function to compute the "top" and "bottom" limits of the confidence interval. The forecasts and their 50% confidence band are shown in Figure 4.

The model
In this section, we will use a labour market model to exemplify how the setup examined in the previous section extends very naturally to the multivariate case. This model was first used in Lucchetti and Staffolani (1996) and later revisited in Russo and Veredas (2000). It is a bivariate unobserved-components model whose purpose is to provide an empirical counterpart to the "buffer stock" theory on working hours. Bentolila and Bertola (1990) and can be briefly sketched as follows: the total demand for labour can be written as L t = N t H t , where N t is the number of employees and H t is the number of hours worked. Firms face hiring and firing costs, so when a firm's desired use of labour factor changes, it may be advantageous to spread ∆N t through time and vary the amount of hours worked while adjustment is taking place. In other terms, if a firm needs to increase its usage of labour, it will increase working hours per worker while new workers are recruited.

The theoretical model on which the empirical model is based is a derivation of the ideas in
Assuming there is a standard weekly working timeH, if no adjustment costs were present, an increment in the demand for labour would lead to a parallel increase in N t simultaneously. If, however, the adjustment takes place through time, H t will deviate temporarily fromH in order to compensate for the extra workers. If we assume that firms optimize their level of labour input intertemporally, then it makes sense to treat the relative variation in the desired level of labour as an unanticipated shock with constant variance ∆ ln(L t ) = η t ∼ NID(0, σ 2 η ). Upon defining h t ≡ ln(H t /H) and n t ≡ ln(N t ), the relation ∆ ln(L t ) = ∆h t + ∆n t = η t holds by definition. If adjustment takes place over time, then, the time path of the hours and of the variation in labour force can be described as: where B(L) = 1 − ∆A(L), so the parameters of the two polynomials A(L) and B(L) are linked via b i = b i−1 − a i . A firm which faces an unanticipated labour demand shock η t will adjust L t instantaneously by varying h t while hiring; in the long term, h t goes back to 0 and n t is fully adjusted. The polynomial A(L) is assumed to be of some finite order p, so full adjustment takes place in a finite time 12 .
In order to match the actual features of observed data series, these two equations must be modified to take three additional facts into account: first, the size of the economy and capital/labour composition of the production technology vary exogenously through time, so ∆n t must contain some additional term to account for long-term trends. Moreover, institutional factors introduce sizeable seasonality in both h t and ∆n t . Finally, the actual number of hours worked per week could deviate from the firm's plan because of random factors, such as weather, power cuts and so forth.
The first effect will not be modelled explicitly, but can be captured via a local level component; seasonality is modelled via two separate seasonal unobserved components (see Durbin and Koopman 2001, p. 40). As a consequence, the empirical model can be cast as follows: where µ t is the local level component, γ t are the seasonal factors and ε t is the noise component 12 Note that neither A(0) nor B(0) are assumed to equal 1; identification is attained via the conditions B(1) = 1 and a0 = 1 − b0.
in the hours equation. In matrix notation, where a and b are vectors containing the parameters of the polynomials A(L) and B(L) in equations (8) and (7), respectively; η t is the vector [η t · · · η t−p ]; e 1 is a vector containing 1 as its first element and 0 elsewhere (with appropriate size depending on context). The L and C matrices are and u is a vector of ones.
The five structural disturbance terms refer to the local level component (u t ), the "buffer stock" component (η t ), the seasonal terms (ω n t and ω h t ) and the transitory hours shock (ε t ); they are assumed to be jointly normal and mutually uncorrelated. The unknown parameters are the coefficients of the polynomial B(L) (or, equivalently, A(L)) and the five variances of the structural disturbances.

Implementation and results
To exemplify the actual implementation of the above model, I used Australian data from February 1978 to June 2009. To be specific, the raw data were taken from the Australian Bureau of Statistics database, cat no. 6291.0.55.001, Table 09. For the employment series and hours series, I used the monthly seasonally unadjusted data series labelled A84378T (average weekly actual hours worked) and A84376L (total employment in thousands).
After compacting the data to a quarterly frequency through averaging, logs were taken. To take into account institutional factors that have drivenH down through time, the log hours series was then regressed against a constant and a linear trend and the residuals were used as h t . 13 The basic infrastructure for estimating the parameters in equations (9) and (10)  Initialization in the first three lines of code is kept intentionally naïve: the coefficients of the B() polynomial are set to 0 and all variances to 1.
The make_matrices function (not described here, but provided in the accompanying files) in the next three lines of code simply fills the system matrices with the elements of the vector θ. In this case, we initialize P 1 to 1000 · I, as diffuse initialization causes numerical problems.
Estimation is carried out by using the kalman block in the next eight lines of code and an mle block in the following six lines of code in a manner very similar to the Nile example in Section 2.
The function print_results in the last three lines of code, apart from printing out the estimates, computes the variance matrix of the estimated parameters.
Estimates for the Australian data are shown in Table 3. The pattern of the buffer stock adjustment appears to be quite reasonable: 53.3% of the adjustment seems to occur in the same quarter as the shock (parameter b_0) and a year after the shock the effect on working hours is small (7.3%, parameter a_4), but still significant.   where the function Giuditta_sm is structurally very similar to Giuditta_estimate: the elements of the vector θ are used to fill up the system matrices and, after having set the model up, instead of using mle to perform estimation, ksmooth is run to return the smoothed estimates of the quantities of interest 15 (shown in Figure 5).

Diagnostics
Diagnostic tests based on standardized prediction errors can be implemented in a similar vein   The computations are delegated to a function called Giuditta_prederr (provided in the accompanying files), which produces the standardized prediction errors. These are, in turn, analysed 16 via internal gretl commands, as follows: series speh = Giuditta_prederr(theta, V) smpl +4 ; qqplot speh corrgm speh 12 normtest speh --all and the output reads: Test for normality of speh: The correlogram and QQ-plot are shown in Figure 6. As can be seen, the correlogram does not show significant persistence of the residuals; non-normality seems not to be a problem either.

Conclusions
In addition to the examples reported here, gretl offers several other facilities to deal with state space models: for example, it is possible to specify a model with time-varying system matrices with little more effort than shown in the examples above, so techniques like the univariate treatment of multivariate time series (see Durbin and Koopman 2001, Chapter 6) is possible.
Moreover, it is also possible to set up a state space model and use it to simulate data, rather than filter existing time series. This may be very useful in those situations when inference via simulation is required.
Of course, gretl's implementation is still rather young, and there are several areas which offer room for improvement: 1. The syntax may benefit from some revision: some keywords may not be necessary for simulation; moreover, it would be useful if auxiliary residuals could be obtained automatically.
2. Native methods for disturbance smoothing or simulation smoothing are not provided yet. Since gretl provides access to all the system matrices and the state vectors, those methods can be implemented via user-level functions. However, the gretl development team is considering the possibility of adding native functions in the interest of computational speed in the future.
3. Structural time series models are known to have likelihoods with potential multiple maxima. An enlargement of the algorithm menu offered by the mle command could prove extremely useful in those cases.
However, the current implementation of state space models in gretl is already capable of handling the large majority of situations currently faced by practitioners. Most importantly, the source is open to inspection, as scientific software should be (some say all software). This ensures, at a minimum, that possible bugs or other shortcomings are, at least in principle, spotted and eradicated quickly. Ideally, the improvement in quality of gretl's state space modelling apparatus may become an effort undertaken by the scientific community at large, with all the obvious benefits that this would entail.