Bayesian Model Averaging and Jointness Measures for gretl

This paper presents a software package that implements Bayesian model averaging for Gnu Regression, Econometrics and Time-series Library - gretl . The Bayesian Model Averaging ( BMA ) is a model-building strategy that takes account of model uncertainty into conclusions about estimated parameters. It is an eﬃcient tool for discovering the most probable models and obtaining estimates of their posterior characteristics. In recent years we have observed an increasing number of software package devoted to BMA for diﬀerent statistical and econometric software. In this paper, we propose BMA package for gretl , which is more and more popular free, open-source software for econometric analysis with easy-to-use GUI. We introduce BMA package for the linear regression models with jointness measures proposed by Ley and Steel (2007) and Doppelhofer and Weeks (2009).


Introduction
We know, from elementary statistical theory, that linear regression attempts to model the relationship between two or more variables by fitting a simple linear equation to observed data.In a classical approach, we usually rely on the Ordinary Least Squares (OLS) or the Maximum Likelihood (ML) estimates and the popular model selection criteria, i.e.AIC and BIC, to find the "best" model.The main problem arises when we have to select a "good" subset of variables from a large set of regressors.When the number of possible exogenous variables is K, the number of possible linear models is 2 K .If we have, for example, K = 30 possible regressors, the number of possible linear combination equals 1073741824.It means that it is very difficult, if not impossible, to find the estimates for all combinations.Moreover Raftery, Madigan, and Hoeting (1997) show that standard variable selection procedures lead to different estimates and conflicting conclusions about main questions of interest.
Bayesian model averaging is a useful alternative to other variable selection procedures, because it incorporates model uncertainty into conclusions about estimated parameters.The BMA is a standard Bayesian solution to model uncertainty, where the inference on parameters is based on a weighted average over all possible models under consideration, rather than on one single regression model.These weights are Bayesian posterior probabilities of the individual models.
There is a recent and growing literature on Bayesian model averaging.Examples of applications of BMA can be found in a number of works (see, for example, Hoeting, Madigan, Raftery, and Volinsky (1999) and Steel (2011) for a recent overview).Our software package for parameter estimation and model comparison of linear regression models is based on Fernández, Ley, and Steel (2001a,b) and Koop (2003).We use the Markov Chain Monte Carlo Model Composition MC3 sampling algorithm developed by Madigan, York, and Allard (1995) to select a representative subset of models.Doppelhofer andWeeks (2005, 2009) define jointness measure of dependence among explanatory variables that appear in linear regression models.We use that measure to identify whether different sets of two variables are substitutes, complements or neither.Similar jointness measure was also proposed by Ley and Steel (2007).
In this paper, we propose BMA package for gretl.We can list several reasons why, in our opinion, it is important to address this topic.The gretl is more and more popular free, open-source software for econometric analysis, both for students and academics.Unlike most other statistical software it has easy to use GUI interface.Our software package is, therefore, a free and easy tool for Bayesian model averaging. 1he rest of the paper is organized as follows: Section 2 briefly outlines the Bayesian model averaging for linear regression models with MC 3 sampling algorithm and jointness measures.Section 3 provides an overview of gretl packages for BMA.Section 4 presents empirical illustration.The final section concludes.

Bayesian inference in Normal linear regression models
In this section, we briefly introduce the main features of Bayesian inference in linear regression models.We present Bayesian estimation in linear regression models with Normal-Gamma natural conjugate prior and many explanatory variables, as well as model selection and Bayesian model averaging techniques.Finally, in this section we present the basics of Markov Chain Monte Carlo Model Composition (MC 3 ) sampling algorithm and jointness measures.

Bayesian estimation and model selection in Normal linear regression models
Consider the Normal linear regression models which differ in their explanatory variables2 .Suppose that we have K potential explanatory variables, which means there are 2 K possible models and let M r for r = 1, . . ., 2 K denote 2 K different models under consideration.Suppose also that y i and x i denote the observed data on the dependent and explanatory variables for i = 1, .., N .The observations are placed in (N ×1) vector y and (N ×k r ) matrix X r containing the set of regressors included in model M r 3 .Thus, we can write our model as where ι N is a (N × 1) vector of ones, β r is a (k r × 1) vector of unknown parameters, ǫ is a (N × 1) vector of errors which are assumed to be normally distributed, ǫ ∼ N (0 N , h −1 I N ) and h is error precision, which is defined as h = 1 σ 2 .Following Koop (2003), the prior for β r is normally distributed (2) while we use noninformative prior for intercept and precision where N (µ, Σ) denotes a Normal distribution with mean µ and variance Σ.The factor of proportionality g is so-called Zellner (1986) g-prior.This prior is a convenient way to specify the prior variance matrix, because it reduces the number of prior variance parameters and considerably simplifies posterior computations.The gretl package offers the four most popular alternative Zellner's g-priors (see Fernández et al. (2001a) and Moral-Benito ( 2010)) • Unit Information Prior (g-UIP), recommended by Kass and Wasserman (1995) • Risk Inflation Criterion (g-RIC), proposed by Foster and George (1994) • Benchmark Prior, recommended by Fernández et al. (2001a) • g-HQ prior which mimics the Hannan-Quinn criterion, see Fernández et al. (2001a) By Bayes rule, the mean of the posterior distribution of slope parameters β r , conditional with respect to model M r , can be written as It is easy to see that if g ≈ 0 the mean of the posterior distribution (8) equals to the OLS estimates.The posterior variance of β r , conditional with respect to model M r , is given by where and Marginal data density, conditional with respect to model M r , may be written as In the Bayesian approach to comparing models, it is considered useful to employ probabilities to represent the degree of belief associated with alternative models.For the Normal linear regression models we can easily test two mutually exclusive (non-nested) and jointly exhaustive models with different subset of variables.Using Bayes's theorem, the posterior odds ratio for for a model M l against model M n is given by where p(M l ) p(Mn) is the prior odds ratio and p(y|M l ) p(y|Mn) is the Bayes factor.If the ratio ( 12) is larger than one, we can say that the data supports model M l over model M m .In our package, we use two popular model priors • Binomial prior, i.e. p(M r ) = θ kr (1 − θ) K−kr for r = 1, . . ., 2 K .Note that for θ = 0.5 we have Uniform prior on the model space, i.e. p(M r ) = 2 −K • Binomial-Beta prior i.e. (see Gelman, Carlin, Stern, and Rubin (1997)) where Ξ denotes model size.
In our package, we only need to specify the prior expected model size E(Ξ).Note that in case of Binomial distribution we have E(Ξ) = Kθ and the choice of E(Ξ) automatically produces a value for the prior inclusion probability θ.If we have Binomial-Beta distribution, the average model size will satisfy E(Ξ) = a a+b K. Here, we follow Ley and Steel (2009) and fix a = 1 and hence we obtain the value of the second hyperparameter b = K−E(Ξ) E(Ξ) .It is easy to show that the posterior probability of model M l is given by The posterior density of vector β is the average of the posterior densities p(β r | y, M r ) conditional on the models Once the model posterior probabilities have been calculated, we can also easily evaluate the mean and variance of the posterior distribution of slope parameters4 and In a similar manner, we can find other characteristics of the posterior distribution (see for example (Koop, 2003, p. 266)).We might be also interested in the estimates of posterior inclusion probability p(i | y) (PIP) i.e. the probability that, conditional on the data, but unconditional with respect to the model space, the variable x i is relevant in explaining the dependent variable y (see Leamer (1978); Mitchell and Beauchamp (1988); Doppelhofer and Weeks (2009)).The posterior inclusion probability is calculated as the sum of the posterior model probabilities for all of the models including variable x i .

MC 3 sampling algorithm
Our MC 3 sampling algorithm is based on the Metropolis -Hastings algorithm, and was originally developed by Madigan et al. (1995).It simulates a chain of models M (s) for s = 1, . . ., N to find the equilibrium distribution p(M r | y) of the posterior model probabilities.We do it as follows.We set a candidate model from the set of models, including the previously accepted model M (s−1) , all models which delete one independent variable from M (s−1) and all models which add one independent variable to M (s−1) .The chain is then constructed by drawing a candidate model M ′ and the acceptance probability has the form In order to assess the stability and convergence of the chain, we look at the Pearson's correlation between the analytical and MC 3 posterior model probabilities.Convergence is achieved if the correlation is above 0.99 (see Fernández et al. (2001b) and Koop (2003)).Note that we measure correlation between the analytical and MC 3 posterior model probabilities only for the top ranked models.If the number of top ranked models is very small, it may lead to high value of Pearson's correlation even, when convergence has not been achieved.

Jointness measures
The main implementations of model averaging are concerned with selection of variables when model uncertainty is present.Another relevant issue which arises in this framework is to identify whether different sets of two variables x i and x j are substitutes, complements or neither over the model space.For that reason, Ley and Steel (2007) and Doppelhofer and Weeks (2009) define ex-post jointness measures of dependence between different sets of explanatory variables.According to Ley and Steel (2007), the logarithm of the jointness statistic has the form where p(i ∩ j | y) represents the sum of the posterior probabilities of those models that contain both variables x i and x j , p(i | y) and p(j | y) are the posterior inclusion probabilities of x i and x j .J LS can be interpreted as the posterior odds ratio of the models including both i and j vs the models that include them only individually (see Ley and Steel (2007)).
An alternative jointness measure was proposed by Doppelhofer and Weeks (2009).It can be written as follows where p( ĩ ∩ j | y) denotes the sum of the posterior probabilities of the regression models in which neither x i and x j are included, p(i ∩ j | y) corresponds to sum of the posterior probabilities of all the models in which x i is included and x j is excluded.The last probability p( ĩ ∩ j | y) is defined accordingly.J DW corresponds to the posterior odds of including i given that j is included divided by the posterior odds of including i given that j is not included (see Doppelhofer and Weeks (2009)).According to Doppelhofer and Weeks (2009), we use the following classification of jointness among variables:

Evidence
Jointness statistics strong substitutes 3 Implementation in gretl In this section, we document the code as well as the use of the gretl package for Bayesian model averaging, together with accompanying jointness statistics.First, we will describe our code and the use of the graphical interface, then we will present how to use our BMA script.
At the end we will present the outputs that are returned.

Hansl programming language
"Hansl (the name expands, in recursive fashion, to "Hansl's a neat scripting language") is gretl's scripting language."(Cottrell and Lucchetti, 2013a, p. 1).Hansl's syntax is very similar to C language including passing pointers to functions.What is very useful for end user is that Hansl provides a nice mechanism for building GUI interfaces to functions/packages.Such packages consist on (at least) one "public" function and zero or more "private" helper functions (see Cottrell and Lucchetti (2013c)).This distinction gives programmers flexibility in writing the packages for gretl and allows to split the code into small pieces (functions) responsible for logically separated computations.
The BMA package consist on 1 public and 17 private functions, but only 16 of them are used regularly.The name of each function starts with "BMA_" prefix.

3.2
The core of BMA code
The big_list is a gretl's object "named list" which is just set of K + 1 variables (defined by names or dataset ID).What is very important: the first member of the big_list is treated as y variable and rest of the members are treated as K explanatory variables.Furthermore the big_list cannot contain a const (gretl's internal and automatically generated constant term).
The scalar av_model_size[0::] is a scalar with prior average model size.Note: if av_model_size = K 2 and prior is set to Binomial, we get Uniform prior on the model space.The scalar alpha[0:1:0.6] is the significance level in OLS estimation.A independent variable enters the initial model if its p-value is less than the significance level (see 2.2).The default value is α = 0.6, but setting α = 1 results in model consisting on 0 to K randomly chosen explanatory variables.
The int l_rank[2::10] is the number of the top ranked models.The default value is 10.See page 5.
The int do_joint[0:2:0] indicates whether we do jointness analysis and if so which measure to use.The default value is 0: "None".See page 5.
The int Nrep[1000000] is the total number of replications in Monte Carlo simulation.The default value is 1000000.
The int verbosity[1:2:1] indicates the level of verbosity of the BMA package when results are printed.The default value is 1: silent mode.

The main loop
The main loop of the BMA package is split into four parts: 1. Setting up the MC 3 sampling algorithm.In the first part of the main loop (Setting up the MC3 sampling algorithm), we set up internal variables and also check correctness of the parameters passed to the package.We use two private functions here The function BMA_initial_model returns X_old_l -the list of explanatory variables in the initial model according to the scalar *alpha.Next, the function BMA_new_X_matrix constructs X_new -the matrix of demeaned explanatory variables based on the X_old_l.
Next, the X_new matrix is taken by the function BMA_matrix_precompute for linear algebra computations necessary to compute formulas ( 8)-( 11).Now we run the following code snippet to compute formula ( 11) Next, we run the function BMA_ols to compute formulas ( 8)-( 10).Finally, we call the function BMA_model_structure, which returns the 1 × K row vector with 1 if certain explanatory variable was in the initial model and 0 elsewhere.
In the third part of the main loop (Markov Chain Monte Carlo simulation), we discard the first Nburn = round(burn/100*Nrep) draws as burn-in replications and then we simulate a chain of models.The most important code snippets are: 1. Drawing a candidate model.At 1. we draw the number of a variable ranging from 0 to K by the gretl's build-in function randint() which uses the SIMD-oriented Fast Mersenne Twister (SFMT) RNG (see Cottrell and Lucchetti (2013b); Yalta and Schreiber (2012))5 .If the dawned variable was in the last model, this variable was then removed from it, otherwise it was added to the last model.
At 2. we take a decision whether to accept the new draw (model) or not.We call the BMA_accept_prob function, which implements priors described on page 4.
At 3. we call the BMA_build_rank function, which is responsible for creating analytical, as well as numerical rankings.
At 4. we do some counting needed for essential BMA computations formulated in ( 15)-( 16), that is the mean and variance of the posterior distribution of slope parameters, as well as average model size and posterior inclusion probability (PIP).
Finally at 5., if jointness analysis was chosen, we call the BMA_jointness_matrix function, which counts each coexistence (jointness) of every pair of explanatory variables in the given draw.
In the last part of the main loop (Results printing), we finally call the BMA_print_results function in order to print the MC 3 sampling results.A detailed description of the structure of the results printed here will be depicted in section 3.3.

The matrix returned by BMA package
The BMA package can optionally return a matrix containing substantial results obtained in the analysis.The structure of that matrix is shown on Figure 1.The result matrix has K rows, one for each explanatory variable.The first five columns are: PIP, Mean, Std.Dev., Cond.Mean, Cond.Std.Dev, see Outputs on page 13 for details.The next K columns appears only if any of jointness analysis was selected and contain values of one of the bi-jointness measures: Ley-Steel or Doppelhofer-Weeks.

PIP Mean
J LS or J DW . . .• List of all variables for BMA (Y must be the first one) -Loading variables from the database, which must have been opened previously.The dependent variable must be the first one on the list of the variables currently available.Notice that by default we assume that you want to estimate an intercept, therefore a constant is implicitly included to the list of the variables.
• Prior -Indicates the choice of model prior.One can employ the Binomial model prior or the Binomial-Beta model prior.Note that the Uniform model prior is a special case of Binomial model prior therefore, in fact, our package allows three types of priors.
• Prior average model size -Specifies the prior expected model size E(Ξ).Note that for the Binomial model prior and E(Ξ) = 0.5K one can define Uniform prior on the model space.
• Significance level for the initial model -Defines the significance level which was used to build the initial model.A explanatory variable enters the initial model if its p-value is less than the significance level.If significance level equals 1 the initial model will be randomly chosen (with equal probability) from all available models.Note that if all available explanatory variable enters the initial model you will get the following gretl's error messages "No independent variables were omitted".
• Number of the top ranked models -Specifies the number of the best models for which information is stored.
• g-prior type -Here one can choose between four Zellner's g-prior for the regression coefficients.Choices include: Benchmark prior, Unit Information Prior (g-UIP), Risk Inflation Criterion (g-RIC) and Hannan and Quinn prior (g-HQP).
• Jointness analysis -If 'None' (the default) the jointness analysis is omitted.On the other hand, one can choose jointness measure of Ley and Steel (2007) or Doppelhofer and Weeks (2009).
• Total number of replications -Defines the total number of iteration draws to be sampled.
• Percentage of burn-in draws -Provides a number of burn-in replications, calculated as the percentage of the total number of iteration draws.
• Verbosity -An integer ranging from 1 to 2: the default is 1, which allows to see the basic Bayesian model averaging results.If verbosity equals 2, a more detailed description of analysis is provided (initial model, speed of convergence, estimation results for top ranked models).
• matrix -You can save the output under a specified name to the current session.

Script
The BMA package can also be used inside a Hansl scripts.The very minimal code could be as follows: open greene9_1.gdtinclude BMA.gfn list green = dataset BMA_main(green, 1, 1.5, 0.6, 4, 1, 0, 1000000, 10, 1) The above example consists of three blocks.The first block is just opening of the so called greene9_1 dataset, which is bundled in every standard gretl installation.This dataset contains cross-sectional data on manufacturing of transportation equipment presented as Table 9.1 in Greene (1999).
The second block is the definition of the green list which contains all variables available in the greene9_1 dataset.The first variablevaladd -will be the dependent variable.
The third block contains the definition of BMA analysis: Binomial prior, prior average model size set to 1.5, significance level for the initial model set to 0.6, 4 top ranked models, Benchmark g-prior, without jointness analysis, 100000 replications with 10% burn in draws and basic output (verbosity set to 1).

Outputs
If you select the appropriate entries in the GUI BMA window our package returns the posterior inclusion probabilities (PIP), the posterior mean and standard deviation of each coefficient (Mean and Std.Dev.) and the posterior mean and standard deviation of each coefficient conditional on the variable being included in the model (Cond.Mean and Cond.Std.Dev).
Let us consider the data used in Fernández et al. (2001b) (FLS hereafter).These data comprises information about 72 countries and 41 potential growth determinants for the period 1960 to 19926 .For example, for the FLS data the following estimates should appear: The BMA estimate function accepts a scalar which sets the verbosity of the output.Its default value is 1, which causes the estimation output to be printed out.The value 2 forces BMA function to print out all the details of estimation.You can print out the above-mentioned results and additional the following informations: the total CPU time, type of model prior, prior average model size, significance level for the initial model, type of g-prior, total number of iterations and finally number of burn-in draws.Moreover, the BMA estimate function produces the estimation results for the initial and top ranking models.The jointness analysis is inactive by default.If it is, not you will get: posterior joint probability of explanatory variables, jointness statistic ( 18) or ( 19) and classification of jointness measures.The jointness analysis for the previous example should look like this

Empirical illustration
In this section, we examine the ability of our package in replicating the results published by Fernández et al. (2001b) and Ley and Steel (2007).We use the same original dataset to attempt to replicate their results.In our empirical illustration, we discard the first 1 million models and draw samples from the model space 2 million times.We specify the following entries in the GUI BMA window: prior = 'Binomial', prior average model size ='20.5'(We set the models priors to the uniform distribution.),number of the top ranked models ='20', g-prior type='Benchmark prior', total number of replications ='3000000', percentage of burnin draws ='33'.Table 1 present the estimation results.This table also reports the posterior means and standard errors of regressors calculated from the BMS Zeugner (2012) package and the results published in Ley and Steel (2007).These benchmarking results allow us to compare and analyse the performance of our package.
As is apparent from Table 1, the gretl package is reasonably successful at matching the reported results in Ley and Steel (2007).All the PIPs and estimated posterior means or standard deviations are reasonably close for all cases and the same variables are identified to be relevant.Note that the gretl package results are almost identical to the results produced by BMS package.The only minor differences in posterior results are found between them and the results published in Ley and Steel (2007).

Conclusions
This paper has outlined the new software package that implements Bayesian model averaging analysis and jointness measures for gretl.Bayesian model averaging is a straightforward and natural extension of standard Bayesian analysis and it is a useful and popular alternative to other variable selection procedures, especially for a large set of regressors.Here we used gretl, which is free, open-source software for econometric analysis with easy-to-use GUI.Our goal was to familiarize potential users with the features and the different options that our package has to offer.We described how our package implements a BMA analysis, as well as the outputs that are returned.

Figure 1 :
Figure 1: Structure of the matrix returned by the BMA package

Figure 2 :
Figure 2: Main window for BMA According to Figure 2, we can specify the following entries in the GUI BMA window 1. function string BMA_parse (list big_list, const scalar *k, const scalar *av_model_size, const scalar *l_rank) 2. function void BMA_scaling_factors (matrix *factors, const scalar *k, scalar *y_sq, const int g_type, const matrix *Y)where arguments indicated by the * modifier are pointers, see Cottrell and Lucchetti (2013c) for explanation.If there is no error, we run the function BMA_scaling_factors which calls the function scalar BMA_gprior (const scalar *k, int type) to compute g-prior according to formulas (4)-(7) and sets up some scalars needed for further computations In the second part of the main loop (Starting model ), we construct the initial model for MC 3 sampling and set up some additional internal variables.Here we use five private functions

Table 1 :
Performance of gretl BMA package for FLS data The dependent variable is the growth rate from 1960-1996 across 72 countries.All the regressors have been standardized to have zero mean and unit variance.