Data Cloning: Maximum Likelihood Estimation of DSGE Models

In this article, we present evidence supporting the use of the data cloning method of Lele et al. (2007) and Lele et al. (2010) for maximum likelihood estimation of DSGE models. In the data cloning method, maximum likelihood estimators are obtained as the limit of Bayesian estimators, taking advantage of existing computational estimation methods based on Markov Chain Monte Carlo. We show the desirable properties of this methodology based on some experiments. In the first experiment, we estimate the CashIn-Advance model of Schorfheide (2000) and compare the results with the most important methods of numerical optimization used in maximum likelihood estimation. The data cloning estimators show low variability, unlike the maximum likelihood estimators, suggesting greater robustness of the former to the problem of initial conditions. In the second experiment, we show that the data cloning estimators also present good results in terms of bias and RMSE. The proposed procedures also allow to check the econometric identification of parameters and to determine the appropriate number of


Introduction
Dynamic Stochastic General Equilibrium (DSGE) models have raised growing interest in the field of macroeconomics. Due to computational advancements, these models have become increasingly more complex and accurate, allowing for an indepth analysis of the relationships between real and nominal economic variables.
Briefly, a model of this class can be described by the following equations: a Taylor rule, i.e., a monetary policy rule in which the interest rate is the central bank's instrument for invigorating the economy; a dynamic IS curve, with expectations; and a New-Keynesian Phillips curve, which also considers the expectations of individuals. From the monetary policy rule and the dynamic IS curve, we obtain aggregate demand. Aggregate supply is given by the New-Keynesian Phillips curve. Economic equilibrium is achieved by the relationship between aggregate supply and aggregate demand curves. The estimation of these models is directly based on the solution of dynamic equilibrium, involving steps to find this solution and subsequent parameter estimation based on the representation of the equilibrium of these models. DSGE models, albeit highly intuitive, are problematic in at least two ways, according to Canova (2007). First, this type of model provides only an approximation to the data generating process, since the vector of the structural parameters generally has a low dimension and thus strong restrictions are imposed in the short and in the long run. Secondly, the number of exogenous variables is usually smaller than the number of endogenous variables, and so the covariance matrix of the endogenous variables becomes singular. Because of these characteristics, the estimation and testing of DSGE models by traditional methods (e.g., maximum likelihood) is very complex. In addition, stochastic singularity prevents numerical routines based on the Hessian from working properly and consequently the maximum of the objective function from being reached, as discussed in Ingram et al. (1994) and Ireland (2004).
A possible solution to these problems is to impose artificial measurement errors for some of the endogenous variables, making the covariance matrix nonsingular, as used in Bencivenga (1992), DeJong et al. (2000a), DeJong et al. (2000b), Ireland (2001), Kim (2000) and Schorfheide (2000). As discussed in Ireland (2004), this strategy allows identifying additional shocks to those observed in real business cycle models and determining how important these shocks are to aggregate fluctuations. However, this strategy has some drawbacks: its implementation requires extending the theoretical model by using ad hoc choices. If these shocks are not important in the theoretical model, their use can introduce noise in the estimation process, leading to incorrect specification, reducing the efficiency of the estimates and the power of hypothesis tests on the parameters.
Another difficulty associated with the estimation of DSGE models using likelihoodbased methods is the shape of the likelihood function. The likelihood function of DSGE models is a nonlinear function of parameters and latent variables, obtained from the numerical solution of a dynamic optimization problem. As discussed in Ruge-Murcia (2007) and DeJong and Dave (2007), the shape of the likelihood function can generate nontrivial problems of estimation. In the model proposed in Ruge-Murcia (2007), the log-likelihood function is flat with respect to the discount factor, preventing the correct assessment of this parameter. As mentioned in DeJong and Dave (2007), the presence of cliffs in the likelihood function can be problematic, since it generates points of non-differentiability, thus violating one of the regularity conditions of maximum likelihood estimators (van der Vaart (1988), Chernozhukov and Hong (2004)). Although some solutions have been proposed, such as the use of more robust methods for discontinuities (e.g. Chris Sims' CSMINWELL algorithm) and the imposition of transformations and restrictions on parameters so as to change the curvature of the likelihood function (e.g. DeJong and Dave (2007)), they do not effectively solve the problems of non-regularity in the likelihood function of DSGE models.
The literature on the estimation of DSGE models focuses on Bayesian estimation methods, usually Markov Chain Monte Carlo (MCMC) methods. According to Herbst (2011) and Herbst and Schorferheide (2013), 95% of papers published in the top economics journals from 2005 to 2010 use the MCMC methods to perform Bayesian estimation of DSGE models. Although part of the motivation for the use of Bayesian methods is related to the dynamic Bayesian learning process in macroeconomic models, as discussed in Poirier (2012), it is possible to speculate that this choice is mostly due to the computational advantages of this method and to the possible use of prior information, which is especially relevant in macroeconomic models with limited sample sizes available. Canova (2007) presents yet another advantage of using Bayesian methods for the estimation of DSGE models: posterior distributions incorporate uncertainty about the parameters and model specification, making them more attractive to macroeconomists. For these reasons, the vast majority of articles in the literature uses Bayesian methods, especially the MCMC algorithms, to estimate DSGE models.
The regularity conditions of MCMC-based Bayes estimators are less restrictive than those of maximum likelihood estimation, as discussed in Robert and Casella (2004) and Chernozhukov and Hong (2004). MCMC-based estimators are valid under general conditions that do not depend on the continuity of the likelihood function. These estimators can be thought of as solutions to global convex optimization problems, avoiding the problems of existing local maxima in numerical algorithms of likelihood optimization. The MCMC methods replace the optimization problem with a corresponding problem of expectations calculation (posterior mean of Markov chains) that is computationally more robust. Other problems with non-regular likelihood models are discussed in Chernozhukov and Hong (2004), for instance, the fact that the asymptotic theory is not standard and overall statistics are not sufficient in this context. Through Bayesian methods, it is possible to efficiently infer the posterior distribution of the parameters and volatility, even when the covariance matrix of the vector of the endogenous variables is singular, since the Hessian is not necessary. However, these algorithms also have problems in the estimation of DSGE mod-els. As we will show in the next sections, there is a great dependence on the choice of the prior distribution. This means that if this distribution is informative, the results might be affected. This variability can be problematic, since different prior distributions can lead to different results and conclusions about the object of analysis.
The procedure proposed in this paper allows constructing maximum likelihood estimators as the limit of Bayes estimators, from MCMC methods, using the data cloning method proposed in Lele et al. (2007) and Lele et al. (2010). While the construction of likelihood estimators using Bayesian methods is already well established in the literature of asymptotic inference (Walker (1969), Ibragimov and Has'minski (1981), van der Vaart (1988)), the construction of estimators based on data cloning is computationally simpler and allows the use of MCMC estimation algorithms of DSGE models with a slight modification. Basically, it is only necessary to build a vector containing k replications of the original sample and to make a minor change in the calculation of the Fisher information matrix. It is also easy to show that this method is computationally efficient, even in dynamic models and in the presence of latent variables, as shown in Jacquier et al. (2007) and Laurini (2013). Under standard regularity conditions, (Lele et al. (2010)) with sufficiently large k, these estimators converge to the maximum likelihood estimators.
As discussed in Lele et al. (2010), the use of data cloning has several other advantages. The algorithm avoids problems in numerical procedures for maximizing the likelihood function, replacing them with MCMC sampling procedures, and thus it is unnecessary to calculate the derivative of the likelihood function of the variance-covariance matrix. This method also avoids the use of diffuse or non-informative priors to provide equivalent frequentist inference, which can be computationally and theoretically problematic, as discussed in Robert and Casella (2004) and Seaman et al. (2012). This paper is structured as follows. Section 2 presents the data cloning method for obtaining a maximum likelihood estimator that is more robust to the usual problems. In section 3, we show evidence of robustness of the data cloning method when compared to usual maximum likelihood estimators. Section 4 aims to support the use of data cloning for the estimation of DSGE models. Section 5 assesses identification problems. Section 6 presents the prior feedback method, i.e. a computationally efficient way to implement the data cloning method. Section 7 briefly describes implementation details. Finally, section 8 concludes.

The Data Cloning Method
Consider the following (hierarchical or latent factor) model: where, Y = (Y 1 , Y 2 , ..., Y n ) is the vector of observations, f and g are joint probability density functions, X is a vector of latent factors, ϕ = (ϕ 1 , ϕ 2 , ..., ϕ q ) is a vector of fixed unknown parameters that directly affect the observations Y and θ = (θ 1 , θ 2 , ..., θ p ) is a vector of fixed unknown parameters related to the latent state process X.
The likelihood function for the model described by equations (1) and (2) is given by: where, y represents realizations of the random variable Y . The maximum likelihood estimates of the parameters, defined as: are the values of (θ, ϕ) = (θ 1 , θ 2 , ..., θ p ; ϕ 1 , ϕ 2 , ..., ϕ q ) that jointly maximize the likelihood function (3). This maximization, however, is not simple, considering it involves elevated computational cost of high dimensional integration, since it is necessary to marginalize the latent variables present in the likelihood function. The use of Bayesian methods circumvents this problem. The parameters (θ, ϕ) are no longer fixed and unknown, and become random variables. The prior distribution, i.e. the joint distribution of these random variables, is a measure of the analyst's beliefs about the different values of the parameters before observing the data. Its combination with the likelihood function, using the Bayes Theorem, generates the posterior distribution, which represents the update on the analyst's beliefs about the different values of the parameters after observing the data.
Defining the prior distribution as π (θ, ϕ), the posterior distribution is given by: The marginal posterior distribution of the parameters, π θ, ϕ | y is obtained, then, by integrating the posterior distribution, h θ, ϕ, X | y for X. Through MCMC algorithms, it is possible to generate random numbers from the posterior distribution using only the numerator of equation (5), precluding the use of integrations in multiple dimensions.
The estimates from the MCMC algorithms, however, are also problematic, considering that there is a large dependence on the choice of the prior distribution in small samples, as typically observed in the estimation of DSGE models. In other words, if the structure is highly informative, the results might be affected, which may lead to different conclusions about the object of analysis. To take advantage of MCMC computational methods while obtaining maximum likelihood estimators, which do not depend on the use of prior information, one possibility is to create estimators that behave as the limit of Bayesian estimators. A simple way to obtain estimators with these properties is the use of data cloning, developed by Lele et al. (2007) and Lele et al. (2010).
To understand this method, assume a hypothetical situation in which an individual performs simultaneously k independent statistical experiments, producing exactly the same results, y. The new likelihood function is now given by the original function, L θ, ϕ; y , raised to the k-th power: L θ, ϕ; y k . Additionally, assume that this individual gets a posterior distribution given by h (k) θ, ϕ, X | y and a marginal posterior distribution, π (k) θ, ϕ | y , using the prior distribution π (θ, ϕ) and the likelihood function with k clones of the data. Formally, the original marginal posterior distribution is given by: where, C y = L θ, ϕ; y π (θ, ϕ) dθdϕ is a normalization constant. Equation (6) can be rewritten as: Lele et al. (2007) prove that if k is large, the marginal posterior distribution, π (k) θ, ϕ | y , is concentrated around the maximum likelihood estimates. The likelihood function with k clones of data has the same maximum point of the original function, given by θ ,φ , and the Fisher information matrix is k times the information matrix based on the likelihood function.
Assuming the usual identification conditions of the maximum likelihood estimator, it is possible to show that the marginal posterior distribution of θ, from the sample containing k replications of the original data, is given by: where, C k, y = L θ, ϕ; y k π (θ, ϕ) dθdϕ is a normalization constatnt. Lele et al. (2010) prove that, under regularity conditions, with large number of clones, π (k) θ, ϕ | y converges to a multivariate Normal distribution with mean equal to the maximum likelihood estimate and variance equal to 1 k I −1 θ ,φ , where I θ ,φ is the Fisher information matrix corresponding to the original likelihood function. Formally: Evidently, in practice we do not have k independent experiments, but we can proceed by using k replicates of the original sample (clones). In this situation, according to Lele et al. (2007) and Lele et al. (2010), we do not have the convergence in probability used in Walker (1969), but the deterministic convergence of a sequence of functions.
The results of Lele et al. (2007) and Lele et al. (2010) show that it is possible to sample sequences of values (θ, ϕ) j from the marginal posterior distribution, π (k) θ, ϕ | y , and obtain the maximum likelihood estimator as the average of these values, as well as the variance-covariance matrix as k times the variancecovariance matrix of this sample: Proof of these properties can be found in Lele et al. (2010) and Baghishani and Mohammadzadeh (2011). Note that several Monte Carlo optimization procedures, similar to the convergence of posterior probability to maximum likelihood estimator, had already been discussed in the literature, e.g. Pincus (1968), ? andD'Epifanio (1996). See also the discussion on the method of Prior Feedback in Robert and Casella (2004), pps 169-173 and in Section 6.
As pointed out in Lele et al. (2010) , one should note that data cloning methods are only an approximation mechanism in order to obtain maximum likelihood estimators using Bayesian estimators. Thus, the usual limits and properties in finite samples of likelihood estimators are still valid. These methods are not meant to reduce the bias of maximum likelihood estimators.
The quasi-Bayesian estimators proposed in Chernozhukov and Hong (2003) and Chernozhukov and Hong (2004) are also related to the data cloning method, using MCMC to construct maximum likelihood estimators. In this case, a Random Walk-Metropolis Hastings algorithm is applied to a Laplace approximation of an objective function, such as likelihood functions. The main difference is that this method is directly equivalent to the maximum likelihood estimator when using weighting functions equivalent to improper priors. In the the case where (weighted) proper priors are used, the quasi-Bayesian estimator is only asymptotically equivalent to the maximum likelihood estimator, but as they are based on the original data sample without replications, they still carry the information of the pseudo-priors used. In the event that these pseudo-priors are flat, mimicking the maximum likelihood estimator, the method can suffer from existing computational problems in Bayesian estimators, exploring very inefficiently the likelihood function and requiring a very large number of samples to obtain estimators with adequate accuracy in the approximation of the likelihood function. Note that the concept of data cloning can be used in the estimator proposed in Chernozhukov and Hong (2003) and Chernozhukov and Hong (2004), applying the procedure of cloning for the sample used in the Laplace transformation of the objective function, as stated in Corollary 5.11 in Robert and Casella (2004), discussing the theoretical properties of Prior Feedback method.
In some preliminary experiments, we noted that the method of Chernozhukov and Hong (2003) and Chernozhukov and Hong (2004) was inaccurate and unstable in the estimation of the model used in this article. In order to get more computationally stable estimation, it was necessary to use informative weights (pseudo-priors), which did not allow a direct comparison with the data cloning method, that eliminates the influence of priors. Although the method proposed in Chernozhukov and Hong (2003) and Chernozhukov and Hong (2004) is more general than the data cloning method, as it can be used for any type of econometric objective function, these computational limitations favor the use of data cloning. The fact that there exists a vast literature on the estimation of DSGE models, as well as comprehensive computational implementation of general solution methods and of MCMC-based Bayesian estimation, are also favorable to the use of data cloning, due to its ease of implementation

Estimation Results
After introducing the data cloning method, we propose a comparative exercise with the usual and most important maximum likelihood estimation meth-ods. To achieve this objective, we use the database of Schorfheide (2000) and the Cash-in-Advance model (hereafter CIA) presented by the author (model 1), briefly reproduced below.
The economy in this model is composed of a representative agent, a firm, and a financial intermediary. The product is determined by the following Cobb-Douglas function: where, Y t is the product at t, K t denotes the capital stock, N t is the measure of labor and A t represents technology, evolving according to the following exogenous process: where, A,t ∼ N (0, σ 2 A ). In each period, the Central Bank adjusts the rate of growth of the money stock, disturbing the economy with a second exogenous process, according to the following monetary policy rule: where, M,t ∼ N (0, 2 M ) and changes in m * and ρ are associated to rare regime shifts.
At the beginning of period t, the representative agent inherits the money stock of the economy, M t . It is easy to see, from equations (13) and (14) that all decisions of the agent reflect shocks -i.e. surprises -to the growth rate of capital stock and technology. The representative agent determines the portion of the capital stock deposited at the bank, D t . The bank receives the deposits from the agent, paying interest R H,t − 1, the Central Bank's monetary injection, X t , and grants loans to the firm at rate R F,t − 1.
The firm then begins production, requiring hours of work from the representative agent. The amount borrowed by the bank is used to pay wage W t H t to the agent, where W t is the nominal wage per hour and H t denotes the hours worked. The agent's cash balance in the period t is therefore given by M t − D t + W t H t . It is important to note that, in the CIA model, the agent cannot leverage consumption with loans from the bank. Expenses must be financed by the agent's accumulated resources in that period.
At the end of period t, the agent receives dividends from the firm, represented by F t . He also receives, from the bank, the amount deposited at the beginning of the period -these deposits earn interest R H,t − 1 -, and dividends, B t .
In summary, in each period, the representative agent chooses the amount consumed, C t , hours worked, H t , and the money deposited in the bank, D t to maximize the present value of the sum of expected future utility. Formally, agent's problem can be represented by the following set of equations: where, P t is the price level. The maximization problem of the firm involves the choice of the capital stock of the period t + 1, K t+1 , labor demand, N t , payment of dividends, F t , and the amount of loans, L t . Formally: The financial intermediary solves the problem: According to Schorfheide (2000), to solve the model, one should derive optimality conditions for the maximization problems. After detrending the variables, it is possible to find a deterministic steady state. The system can be log-linearized around it to obtain the solution by elimination of unstable roots. Using the notation of Schorfheide (2000), the state-space representation of the log-linearized DSGE model is given by: t ∼ iidN (0, Σ t ) where, y t is a nx1 vector of observable variables, t = [ A,t , M,t ] , Σ t is the diagonal matrix with elements σ 2 A and σ 2 M , s t is a vector of deviations of the detrended model from its steady state, and the matrices Ξ 0 , Ξ 1 , Ξ * , Ψ 1 and Ψ * are functions of the structural DSGE model parameters.
In our experiment, we use the CIA model just described above to make estimates with 50 different initial values for the most important methods of maximum likelihood estimation implemented in Dynare, which are: the Nelder-Mead simplex method, the CSMINWELL algorithm that is robust to cliffs in the likelihood function of Chris Sims, the covariance matrix adaptation evolution strategy (CMA-ES) method, an evolutionary optimization algorithm developed for nonlinear non-convex optimization problems, and Marco Ratto's newrat method. All these algorithms are designed to be robust to poor initial values and discontinuities in the likelihood function. The results of our experiment are compared with data cloning estimates, with 50 and 100 replications. In the data cloning estimation, each different initial value is used in two ways -as an average of prior distributions and as initialization of the MCMC algorithm. The structure of the prior distributions follows Schorfheide (2000). The random sampling of initial values is made by adding a uniform [-0.07,0.07] error to the initial values assumed in Schorfheide (2000).
As mentioned earlier, the database in our experiment is fixed and simulated with the estimated values of Schorfheide (2000), and the same simulated database is used in the estimation with distinct initial values. Therefore, by using these parameters in the simulation we have a 'true value' of the vector of parameters to compare with the values estimated by the different methods and different initial values. At each replication, we change the vector of initial values, generating a random sample around the true vector of parameters to initialize the usual maximum likelihood estimates, the priors and the Markov chains in the data cloning method. In each replication, all methods use the same vector of initial values. The results in Table 1 show the mean and standard deviation of the parameters obtained in the n replications of the experiment, and in figure ?? we show the boxplots with distributions of estimated parameters. In the experiment with the likelihood methods, we used the point estimation of the optimization algorithm. The average posterior of the Markov chains, with 20,000 samples, after discading the first 10,000 samples, was used as point estimator of the data cloning method.
After analyzing the table, it is easy to note significant differences in the estimation results obtained from the various maximum likelihood methods. This suggests that there is a great dependence of initial values on these methods. All numerical likelihood optimization methods used in this experiment have been designed to be robust, but in practice it is not possible to check this property for the proposed  model. The experiment results also show that the data cloning estimators have low variability and are closer to the true values used in the simulation, highlighting the robustness of this method to the problem of initial conditions. The variability in the data cloning estimators is of order √ n, generated by the Monte Carlo approximation in the MCMC method.
It is also important to note that the computational cost favors the data cloning method. The MCMC sampling procedure is very efficient, allowing a low computational time for each sample. In the numerical optimization methods, the convergence procedure is time consuming and oftentimes it is necessary to restart the algorithm, especially in the presence of cliffs on the likelihood function (e.g. CSMINWELL).

Monte Carlo Evidence
Based on the favorable evidence concerning the data cloning method presented in the previous section, we conducted a new experiment, in order to support the use of this method in the estimation of DSGE models. We performed a standard Monte Carlo experiment to check the properties of the estimators obtained from simulations of the 'true model' samples, instead of comparing the effects of the vector of initial parameters using a fixed sample.
We simulated observations using the model of Schorfheide (2000), utilizing the same vector of parameters shown in section 3 as 'true' parameters, and estimated the parameters by the default maximum likelihood estimation method of Dynare -CSMINWELL -and the data cloning algorithm with 50 replications of the original sample. We used two sets of initial values in each method, to check its sensitivity in the estimates: the 'true vector' of parameters, as the first vector of initial values, and a second vector of parameters, using the mean for the priors assumed in the Bayesian estimation reported in Schorfheide (2000). The initial values are also used as the mean values for the priors in the data cloning method. In Table 2, we show the mean error (bias) and the root mean square error -henceforth RMSE -in relation to the parameters used in the simulation exercise, and in figure 2 boxplots of the distributions of estimated parameters obtained in this experiment. mle 1 mle 2 dcmle 1 dcmle 2 ME RMSE ME RMSE ME RMSE ME RMSE α -0.   Table 2 shows that, even when initializing the parameters with their real value, there is a significant difference in the results obtained with different initial values using the maximum likelihood method with numerical optimization. The estimation results for the data cloning method suggest low sensitivity to initial conditions and good results in terms of bias and RMSE, supporting the use of this method in the estimation of DSGE models.
Finally, we use the data cloning method with 50 and 100 replications to estimate the maximum likelihood of the model presented in Section 3, using the same database and priors used in Schorfheide (2000), comparing the results with the posterior values estimated by Schorfheide (2000), using the Bayesian MCMC estimation. Table 3 shows the parameters and standard deviations reported in Schorfheide (2000) and the results obtained by the data cloning estimators. It is easy to note that the results of the data cloning estimation are consistent with the general results obtained in Schorfheide (2000), except for the parameter φ. The value obtained for this parameter, however, is closer to one, which could indicate an outcome that is more consistent with the economic theory.   Schorfheide (2000). Bayes corresponds to the Bayesian estimation reported in Schorfheide (2000), and dcmle to the data cloning estimation, using 50 and 100 replications (clones) of the original sample.

Econometric Identification
A typical problem in the implementation and estimation of DSGE models is verifying the identification conditions, i.e. if there is only one vector of parameters consistent with the maximum of the likelihood function. Identification is one of the fundamental regularity conditions on maximum likelihood estimation, e.g. van der Vaart (1988). If this condition is violated, the Hessian of the optimization problem is singular or near-singular, causing problems in numerical optimization methods and in the calculation of the variance of the parameters. As discussed in Canova and Sala (2009), observational equivalence, partial and weak identification problems are widespread, difficult to identify empirically and may lead to biased estimation, unreliable hypothesis testing and to the choice of false models. Thus, it is essential to check the identification in DSGE models.
As discussed in Lele et al. (2010), it is possible to evaluate the identification of parameters directly through the data cloning procedure. According to the authors, a parameter is nonidentifiable if its variability does not converge to zero with the increase in the number of replications. This identification check can be easily performed graphically, using the standard deviations of the chains sampled in the MCMC procedure. We conducted this analysis for the estimates of the model of Schorfheide (2000), showing in Figure 3 (a) the standard deviation of all parameters as a function of the number of clones. In figure 3 (a), we normalized the standard deviation to enhance visualization. Through the analysis of this figure, it is clear that all parameters in our experiment are identified, as decreased variability is evident from the increased number of replications, showing convergence to the point mass maximum likelihood estimator. Lele et al. (2010) also discuss that a global identification measure may be obtained from the maximum eigenvalue of the covariance matrix of the parameters in each data cloning procedure. Similarly to the former identification scheme, the model is globally identified if the maximum eigenvalue converges to zero with the increase in the number of replications. Figure 3 (b) confirms the previous evaluation regarding the identification of the model, by showing that the maximum eigenvalue converges to zero with the increase in the number of replications. These two analytical identification procedures are immediate by-products of the data cloning method, allowing an empirical identification evaluation procedure of the model and responding to a fundamental problem in the estimation of DSGE models.

Prior Feedback
Prior feedback is a complementary mechanism for the data cloning estimation (Robert and Soubiran (1993), Robert and Casella (2004)). This procedure uses an increasing number of replications, initializing the prior distribution of the estimation with k replications, with the posterior distribution obtained in the estimation performed with k − 1 replications. This algorithm is initialized with one replication (the standard Bayesian estimation). The number of replications is increased until the parameters converge by a predetermined measure (e.g. Robert and Casella (2004)). The great advantage of this estimation method is that it is more robust to possible problems of convergence of the MCMC chains when the procedure is initialized directly with a large number of clones, as discussed in Robert (2010). Additionally, this estimation procedure inherits some properties of robustness from simulated anneling optimization methods, as discussed in Robert and Soubiran (1993).
In this section, we propose a final analysis, using the procedure briefly described in the previous paragraph, with the original dataset and with the initial vector of prior distributions of Schorfheide (2000) to make the estimation. We define convergence when the variation in the marginal likelihood obtained from the MCMC estimation between the models with k and k + 1 clones is less than 0.001. When this convergence is obtained, the algorithm is terminated. Figure 4 shows the value of the marginal likelihood of the model obtained for each number of replications. With these criteria, convergence is obtained with only 21 clones. Note that we can define other metrics of convergence, such as some rule of variation in the parameters, similar to the procedures for convergence in the function and the parameters in numerical optimization methods. It is advisable to use the prior feedback procedure when there are convergence problems in the MCMC algo-rithm, allowing for local exploration of the likelihood function. Note that it is also possible to use a more complex version of this algorithm, decreasing the number of clones and reseting the estimation if a problem is encountered in the exploration of the likelihood function with MCMC methods, as in the SAME algorithm proposed in Doucet et al. (2002). It is important to observe, though, that this version is computationally more intense. allowing to

Implementation Details
All of the analysis were implemented using Dynare (version 4.3.3, Adjemian et al. (2011)) and the Octave programming language (Eaton (2002)). The data cloning procedure was implemented by simply replicating the original data sample and stacking k clones. The Bayesian estimation procedure was carried out in the conventional form, by adjusting the calculation of the variance of parameters as described in Section 2. The selection of several numerical optimization methods was made by the mode compute option in the Dynare estimation procedure.
The calibration acceptance rate of the Metropolis-Hastings algorithm (mode computation) can be performed in the standard Bayesian estimation (no clones) and reused in the estimation with k clones, or calibrated directly in the estimation with k clones. The results were essentially equivalent. In the prior feedback procedure, in addition to using the posteriors obtained from the estimation with k − 1 clones as priors for the estimation with k clones, we also calibrated the Metropolis-Hastings algorithm via its acceptance rate using the result of the Bayesian estimation with the original sample (without clones), in order to preclude the use of recalibration in each stage of the prior feedback procedure. The implementation of the data cloning procedure is quite simple and can easily be an additional option for maximum likelihood estimation in Dynare, complementing the several methods that have already been implemented.

Conclusions
In this paper, we presented the data cloning method of Lele et al. (2007) and Lele et al. (2010) to obtain a maximum likelihood estimator as the limit of a Bayesian estimator, showing evidences that this method is more robust than the traditional likelihood estimators.
These evidences were obtained from three experiments. In the first, we used the dataset of Schorfheide (2000) and the CIA model presented by the author in a comparative exercise with the most important likelihood estimation methods.
We found significant differences in the estimation results of the most important likelihood methods and great dependence on the vector of initial values. The data cloning estimators presented low variability, supporting the robustness of this method to the problem of initial conditions.
The second experiment aimed to support the use of the data cloning method in the estimation of DSGE models. Once again, the results suggested that the data cloning estimators are less sensitive to initial conditions and show good results in terms of bias and RMSE, unlike the traditional maximum likelihood estimators.
We showed evidence of identification of all parameters in our data cloning estimations, as presented in Lele et al. (2010). We also evaluated convergence in a third experiment, which was reached when the variation in the marginal likelihood in each number of clones was less than 0.001, using the prior feedback procedure. By this criterion, only 21 clones were required to obtain convergence.