Improved Laplace Approximation for Marginal Likelihoods

Statistical applications often involve the calculation of intractable multidimensional integrals. The Laplace formula is widely used to approximate such integrals. However, in high-dimensional or small sample size problems, the shape of the integrand function may be far from that of the Gaussian density, and thus the standard Laplace approximation can be inaccurate. We propose an improved Laplace approximation that reduces the asymptotic error of the standard Laplace formula by one order of magnitude, thus leading to third-order accuracy. We also show, by means of practical examples of various complexity, that the proposed method is extremely accurate, even in high dimensions, improving over the standard Laplace formula. Such examples also demonstrate that the accuracy of the proposed method is comparable with that of other existing methods, which are computationally more demanding. An R implementation of the improved Laplace approximation is also provided through the R package iLaplace available on CRAN.


Introduction
Several Bayesian and frequentist applications involve the evaluation of finite integrals of the form where −h n (x) is a smooth and concave real function of the d-dimensional real vector x which may depend on a quantity n > 0. For instance, in Bayesian analyses −h n (x) may be the loglikelihood or the kernel of the log-posterior, or in Generalized Linear Mixed Models (GLMM) it may represent the sum of the log-likelihood and the log-density of the random effects.In both cases, n is related to the information in the sample, and is typically the sample size.
Integral (1) is often intractable but it can be approximated by numerical integration methods, such as Gaussian quadratures, Monte Carlo integration methods or by Laplace's formula (see, e.g., Evans & Swartz, 2000).Numerical integration via quadrature rules is typically accurate but, due to the curse of dimensionality, it is feasible only for low-dimensional integrals, i.e. d < 10 (Evans & Swartz, 1995).Monte Carlo methods are widely applicable to general functions h n (x), especially for large d, provided I n is finite.However, Monte Carlo integration is typically simulation consistent, in the sense that it reaches the true value as the number of simulations goes to infinity.Alternatively, when −h n (x) has a single mode, which becomes increasingly dominant for large values of n, methods based on asymptotic expansions, such as Laplace's formula, are computationally more convenient.Asymptotic expansions around the mode give analytical approximations of I n which do not require tuning, and which are consistent for n → ∞.Hence, unlike Monte Carlo approximations, the accuracy of asymptotic methods is bounded by some function of the given n.Nevertheless, when −h n (x) is single peaked, even for moderate values of n asymptotic approximations may outperform Monte Carlo methods for a given computational cost (see, e.g., Rue et al., 2009;Ruli et al., 2014).This makes asymptotic approximations particularly appealing in applications.
We focus on the Laplace method of integration for approximating marginal likelihoods of the form (1), both for Bayesian and frequentist inference.The Laplace method is widely used in the Bayesian literature for approximating posterior densities and posterior moments (see, e.g., Tierney & Kadane, 1986;Rue et al., 2009) or Bayes Factors (see, e.g., Kass & Raftery, 1995), and in a frequentist framework for integrating out random effects in GLMM (see, e.g.Breslow & Clayton, 1993) or to compute marginal likelihoods in group models (Barndorff-Nielsen & Cox, 1994, Sect. 2.8;Pace et al., 2006).In addition, Guihenneuc-Jouyaux & Rouss Lastly, Martino et al. (2011) and Rizopoulos et al. (2009) apply the Laplace method in the context of survival analysis and joint modelling of survival and longitudinal data, respectively.
We propose an improved Laplace approximation for integrals of the form (1).This approximation achieves third-order accuracy, i.e. it has error of order O(n −3/2 ), in a standard asymptotic setting, i.e. when the sample size n diverges and d is fixed.The main idea behind the proposed method is to approximate (1) indirectly, through an approximation of the density given by the normalised integrand function.Such an approximation involves d scalar Laplace approximations for marginal densities (Tierney & Kadane, 1986), which can be easily renormalised numerically.The ratio between the integrand function and the approximated density, both evaluated in an arbitrary given point, provides the improved Laplace approximation of (1).This is similar in spirit to Chib (1995).
With respect to the standard Laplace approximation, the proposed method requires some additional optimisation tasks and d scalar numerical integrations.However, the most demanding task is the localisation of the global mode, which is a requirement also for the standard Laplace method.More details on implementation issues are discussed in Section 4 and 5.
The rest of the article is structured as follows.Section 2 sets the background.Section 3 introduces the improved Laplace approximation.Section 4 illustrates the proposed method in three applications and Section 5 concludes the paper with some final remarks.

Background
In order to ease notation, in the following we drop n form I n , h n (x) and related quantities.
Hence, let h(x) be a smooth function of x = (x 1 , . . ., x d ), at least twice differentiable and with unique minimum at x = (x 1 , . . ., xd ).In addition, we assume that x is the only point for which and that the Hessian matrix of h(x) at x, i.e.
Consider a sample y = (y 1 , . . ., y n ) of size n from a model with probability mass or density function p(y; θ), θ ∈ Θ ⊆ IR d .In Bayesian applications, −h(•) can be the logarithm of the posterior distribution π(θ|y) ∝ L(θ; y)π(θ), where L(θ; y) is the likelihood function for θ and π(θ) is the prior distribution.In this case, (1) gives the posterior normalizing constant that can be used, for instance, to compute Bayes Factors (BFs).In GLMM, −h(•) is the logarithm of the likelihood, conditional on the random effects, times the density of the latter.
In this case, the integration is performed with respect to the random effects and (1) gives the marginal likelihood of the fixed parameters given the observed data y.
In the standard asymptotic setting with d fixed and n → ∞, the standard Laplace approximation (2) is second-order accurate.On the other hand, if also d → ∞, the asymptotic expansion requires more terms in order to achieve the same accuracy as in lower-dimensions.
For instance, in the fortunate but unlikely case of (1) being factorisable into a product of d scalar identical integrals, the relative error would be of order O(d/n) (Small, 2010, Sect. 6.9).However, in practice, n is fixed and one may question if such accuracy is sufficient for practical purposes where d may be large as well, and if it can be improved to some extent.
Most of the approaches for reducing the error of the standard Laplace approximation involve the inclusion of higher-order derivatives of h(x) in the Taylor expansions.Lindley (1980) use this idea in a Bayesian context.Raudenbush et al. (2000) propose a higher-order Laplace approximation for GLMM, by considering derivatives of h(•) up to the the sixth order.Pace et al. (2006) use a similar approach for approximating marginal likelihoods in group models.However, when d > 1, the computation of higher-order derivatives can be demanding.
Similar strategies are the Bayesian Bartlett correction proposed by DiCiccio et al. (1997), and the corrected Laplace approximation of Shun & McCullagh (1995).The latter solution is designed for situations, such as models with crossed random effects, in which the standard Laplace approximation may not be asymptotically valid.
Another improvement of the standard Laplace approximation is proposed by Nott et al. (2009), in which (1) is approximated by the product of scalar blocks, after a preliminary approximate orthogonalising variable transformation.Although the proposal of Nott et al. (2009) tends to work slightly better than the standard Laplace approximation, both methods have the same asymptotic error.
Finally, we notice that the proposed method differs from the integrated nested Laplace approximation (INLA) of Rue et al. (2009).Indeed, in INLA the approximation of the marginal likelihood requires multivariate integration whereas the improved Laplace breaks the multivariate integration problem in d univariate integrations, by sequentially applying the renor-malised Laplace approximation for scalar marginal densities.

The improved Laplace approximation
Let us consider the probability density function p(x) = exp{−h(x)}/I, where exp{−h(x)} is the integrand function and I is the unknown quantity of interest.
By definition and if p(x) is known, (3) readily provides I for every value of x.The idea is to use a suitable estimate p(x) of p(x) and to obtain from (3) an estimate Î of I, given by exp{−h(x)}/p(x).
This was first explored by Chib (1995) for estimating Bayesian marginal likelihoods, with p(x) approximated by Gibbs sampling (see also Chib & Jeliazkov, 2001, for some extensions to general MCMC methods).In principle the method works for any x.However, it is advisable to locate such point at a high density region (Chib, 1995).One possibility is to choose x = x, which in our case is also convenient from a computational point of view, as explained below.
We propose to approximate p(x) as follows.First, p(x) is split according to the law of total probability in d scalar densities.Each factor is approximated using the Laplace formula (2) for marginal densities proposed by Tierney & Kadane (1986) and renormalised numerically.The product of these approximations provides a third-order asymptotic estimate of p(x) which, when substituted in (3), provides the improved Laplace approximation of I, denoted Î iL .
In the following, we give the details for the approximation of p(x).Let x 1:q = (x 1 , . . ., x q ) denote the first q components of x, and similarly let x q+1:d denote the last q + 1 components of x (q < d).Moreover, let xx 1 be the constrained minimum of h(x) with x 1 fixed and let xx 1:q ,x q+1 the constrained minimum with x 1:q fixed at x1:q and x q+1 fixed.For the validity of the proposed method we require that h(x) satisfies the assumptions in Section 2. In addition, we also assume that the Hessian of each constrained minimisation is positive definite.Essentially, these assumptions are similar to those required by the standard Laplace approximation for marginal posteriors (Tierney & Kadane, 1986).
The proposed improved Laplace approximation is obtained as follows.We write p(x) as Then, we approximate separately each factor on the right-hand side of (4) by the Laplace approximation for marginal densities and we renormalise it numerically.In particular, using results from Tierney & Kadane (1986), for the marginal density p(x 1 ) we have that , where V 2:d (•) is the block (2:d, 2:d) of the Hessian matrix V (•).Upon numerical renormalisation, this marginal density is third-order accurate.Indeed, let p * (x 1 ) = p(x 1 ) p(x 1 ) dx 1 .
Consider now the second ratio of integrals in (4), i.e. the conditional density p(x 2 |x 1 ).As for p(x 1 ), we apply again the standard Laplace method to the numerator and denominator, but with x 1 fixed at x1 .This procedure yields the approximation ) is univariate, it can be easily normalised by scalar numerical integration to give p * (x Then, evaluated at x2 , it gives a third-order approximation of p(x 2 |x 1 ).
Finally, notice that the last conditional density on the right hand side of (4), with x p−1 , . . ., x 1 all fixed at their respective modal values, does not involve asymptotics nor optimisations, and only need renormalisation by numerical integration.
The product of these d−1 Laplace approximations evaluated at their modal values delivers p * (x) = p * (x 1 ) i.e. a third-order accurate approximation of p(x) at x = x.By replacing this approximate density in (3), we obtain Hence, in a standard asymptotic setting with d fixed and n → ∞, the proposed improved Laplace approximation is third-order accurate.On the other hand, if also d → ∞, it is easy to see that the error of the approximation becomes O(d/n 3/2 ).The aim is to compare the behaviour the standard ( Î L ) and the improved ( Î iL ) Laplace approximations with the target value (I) computed by adaptive quadrature methods, as n diverges.In particular, following Davison et al. (2006), let suppose that and that The left panel of Figure 1 reports the log-log plot of Î iL /I and Î L /I, both averaged over the repetitions at each value of n, against the sample size n.This plot highlights that the improved Laplace approximation is more accurate than the standard Laplace method, and that the convergence at 1 of Î iL /I is almost immediate.
The log-log plot of the relative error agains n on the right panel of Figure 1 shows that the improved Laplace method achieves third-order accuracy whereas the standard Laplace formula is second-order accurate, e.g.c 1 = 1.5 and c 2 = 1.
Example 2: Fixed n and d → ∞.Consider the multivariate t/skew-t distribution proposed by Jones (2002), which is given by the multivariate t-Student density centred at 0 and with identity scale matrix, where the marginal density of the first component is replaced with the univariate t/skew-t density (see, Jones, 2002, Eq. ( 6)).As in Jones (2002, p. 95), we consider the parametrisation with a, c and ν, where a > 0 and c > 0 determine the distribution of the skewed marginal and ν, the degrees of freedom (df), controls the tail behaviour of the distribution.The case with a = c = ν/2 leads the the ordinary multivariate t-Student distribution with identity scale matrix and ν df.
We approximate the normalizing constant of the multivariate t/skew-t density, which is obviously 1, with the improved and with the standard Laplace approximations, in two scenarios: one with a = c = 1.5 and one with a = 12 and c = 0.5.For each scenario we consider multivariate t/skew-t densities with dimensions 3, 8, . . ., 98.The results are shown in Figure 2. First of all, note that the standard Laplace approximation rapidly deteriorates in higher dimensions, in both scenarios.Second, the standard Laplace approximation is worse for lower degrees of freedom.This is obvious since the Laplace approximation (2) works better with quadratic functions, which in this case are obtained in the limit as ν → ∞ and a = c = ν/2.Finally, also with higher skewness (large a and small c or vice versa) again the quality of the standard Laplace approximation deteriorates.On the contrary, the improved Laplace approximation is not affected by any of these conditions, and gives almost exact  answers in both scenarios.A similar example was considered also by Nott et al. (2009), in order to test the accuracy of their modified Laplace approximation.However, their method is substantially less accurate, only slightly improving the poor quality of the standard Laplace approximation.For instance, the normalising constant of the 10-variate t/skew-t density with a = 4, c = 1 and ν = 3, is 0.013 with the standard Laplace method, 0.02 with the modified Laplace approximation of Nott et al. (2009) and 0.9981 with the proposed improved Laplace approximation.

Applications
In this section we illustrate the accuracy of the proposed method by three applications.The first two applications focus on Bayesian inference in nonlinear regression models, where the aim is to choose between the normal and the t-Student error distributions.The third application is a logistic regression model with crossed random effects applied to the well known Salamander data (see McClullagh & Nelder, 1989, p. 440).The Maximum Likelihood Estimate (MLE), obtained by maximising the approximate marginal log-likelihood computed by the improved Laplace method, is compared with the analogous obtained with other competing methods.
All examples can be reproduced by using the R (R Core Team, 2014) package iLaplace, available online as Supplementary Material.

Nonlinear regression with BOD2 data
Consider the nonlinear regression model where y i is the response variable, x i is a covariate, (β 1 , β 2 ) are unknown regression parameters and the ǫ i are independent error terms, i = 1, . . ., n.
We focus on two possible distributions for the error terms: the normal and the t-Student with unknown df ν, distributions.The aim is to choose among them using the BF.For the normal model we need to integrate out the parameter (β, σ), whereas for the t-Student model the parameter to be integrated out is (β, σ, ν).
As an example we consider the BOD2 dataset (Bates & Watts, 1988, p. 305), which concern a study on biochemical oxygen concentration (y) as function of time (x).We assume the parameter a priori independent with a bivariate t-Student distribution centred at zero, with scale matrix 10I 2 and 5 df for β.For the scale parameter of both models, following the recommendations of Gelman (2006), we assume a half-Cauchy prior with scale equal to 10. Finally, the Jeffreys rule prior proposed by Fonseca et al. (2008) is taken for ν.To avoid numerical issues, σ and ν are considered in logarithmic scale.
We compare the improved Laplace approximation with the standard Laplace and the Bartlett-corrected Laplace approximations.Comparisons are performed also with the importance sampling (IS) approximation and with the method proposed by Chib & Jeliazkov (2001).The Bartlett-corrected Laplace approximation is performed with a 10 7 MCMC sample, which after a burn-in period and suitable thinning resulted in 10 6 final draws.The same posterior sample is used also for computing the marginal likelihood with Chib & Jeliazkov's method.For IS we consider 10 6 draws from the multivariate t-Student distribution with 3 df, centred at the posterior mode, with scale matrix equal to c > 0 times the inverse of the posterior Hessian at the modal value.Finally, we also compare the results with those obtained using the adaptive numerical integration method implemented in the R package cubature.The reason of the inaccuracy of the standard Laplace approximation in the case of the t-Student model is most likely due to the non normality of the marginal posterior of (log σ, log ν).Indeed, a look at the bivariate kernel density estimate of this marginal bivariate posterior (not reported here) reveals that it is banana-shaped, and therefore it is quite far from being elliptical.Nevertheless, such a shape is well accommodated by the improved Laplace approximation.

Nonlinear regression with Lubricant data
As a second example we consider the Lubricant data (Bates & Watts, 1988, p. 275), which concern the kinematic viscosity of a lubricant (y) as a function of temperature (x 1 ) and pressure (x 2 ).For the ith observation, the model can be written as + σε i , i = 1, . . ., n.As in the previous example, the aim is to choose between the normal and the t-Student with ν unknown df, error distributions using the BF.For the normal model, the parameter to be integrated out is (β, σ), whereas for the t-Student model the parameter is (β, σ, ν), with β = (β 1 , . . ., β 9 ).
We assume the parameter a priori independent, and a 9-variate t-Student distribution with location 0 9 , scale matrix 50I 9 and 5 df for β.Moreover, for σ and ν we take a half-Cauchy distribution with scale 10 and the Jeffreys product rule prior (Fonseca et al., 2008), respectively.To avoid numerical issues, σ and ν are considered in logarithmic scale.
We compare the improved Laplace approximation with the standard Laplace and the Bartlett-corrected Laplace approximations.Comparisons are performed also with the IS approximation and the method of Chib & Jeliazkov (2001).We do not pursue comparisons with adaptive numerical integration as the dimension of the integral is too high for the results to be trustworthy within a reasonable amount of time.The Bartlett-corrected Laplace approximation is performed with a 10 7 MCMC sample, which after a burn-in period and suitable thinning resulted in 10 6 final draws.The same posterior sample is used also for computing the marginal likelihood with Chib & Jeliazkov's method in one block sampling.
For IS we consider 10 6 draws from the multivariate t-Student with 3 df, centred at the posterior mode, with scale matrix equal to c > 0 times the inverse of the posterior Hessian at the modal value.

Model
Laplace The various approximations of the posterior normalizing constant (in logarithmic scale) and of the related log 10 BF are illustrated in Table 2.All the approximation methods offer some evidence against the normal model in favour of the more robust t-Student model.
However, compared with IS and Chib & Jeliazkov's method, the standard Laplace and the related Bartlett-corrected approximations appear to be slightly higher, whereas the improved Laplace gives very similar results.
Also here the inaccuracy of the standard Laplace approximation is most presumably due to the non-elliptically shaped marginal posterior of (log σ, log ν).
where f (u; θ u ) is the density of the random effects.This function can be written also as where f (u; θ, θ u , y) is such that f (u; θ, θ u , y) du = 1.Hence, in this case, given (θ, θ u ) we aim at integrating out u from the product of the density of the data and the density of u.
As an example, consider the Salamander mating data first published in McClullagh & Nelder (1989, p. 439).These data have been analysed also by Karim & Zeger (1992), Shun (1997), Booth & Hobert (1999), Bellio & Varin (2005), Sung & Geyer (2007), among others, and consist of three separate experiments, each performed according to the design given in McClullagh & Nelder (1989, Table 14.3).Each experiment involved matings among salamanders in two closed groups.Both groups contained five species R females, five species W females, five species R males and five species W males. Within each group, only 60 of the possible 100 heterosexual crosses were observed owing to time constraints.Thus, each experiment resulted in 120 binary observations indicating which matings were successful and which were not.
As in McClullagh & Nelder (1989, p. 441), the data are modelled as if different sets of 20 male and 20 female salamanders were used in each experiment.Let y ij be the indicator of a successful mating between female i and male j, for i, j = 1, . . ., 60, where only 360 of the (i, j) pairs are relevant.Let u f i denote the random effect that the ith female salamander has across matings in which she is involved, and define u m j similarly for the jth male.The data y ij are assumed conditionally independent with where x ij is a 4-dimensional row vector of zeros and ones indicating the type of cross and β is the vector of regression parameters.Moreover, u f i and u m j are assumed to be independent normals with zero mean and variance σ 2 f and σ 2 m , respectively.
As a first example we consider the estimation of the fixed effect and (σ 2 f , σ 2 m ) for each separate experiment -following the same model structure as in Shun (1997) -by maximising the approximate marginal likelihood.The aim is to compare the MLE based on the marginal likelihood approximated by the improved Laplace method with those based on the modified Laplace approximation proposed by Shun & McCullagh (1995) and Shun (1997).It is well known that in models with crossed random effects the standard Laplace approximation is not asymptotically valid (Shun & McCullagh, 1995), and it may give poor results.
Let β = (β 0 , β WS f , β WSm , β WS f ×WSm ) be the fixed effects, where β 0 is a constant, β WS f is the effect of the dummy variable WS f which takes one if the observation is from a species W female and zero otherwise, and so on.The marginal likelihood is given by which involves a 40 dimensional integral that cannot be reduced to a product of lower dimensional integrals, even if the random effects have independent normal distributions.The approximate MLEs for the three separate experiments (reported in Table 3) are compared with those of Shun (1997, Tab. 2 and Tab. 3).In this example the improved Laplace approximation is run in parallel over 11 threads.The maximisations of the approximate marginal likelihood computed with the improved Laplace is initialised at (β, log σ 2 f , log σ 2 m ) = (0, 0, 0, 0, 0, 0), and in the three experiments took 5, 3 and 4 minutes, respectively in a CentOS workstation with 64Gb Ram.The convergence is reached after 17, 13 and 14 iterations, respectively.
From Table 3 we notice that the approximate MLEs obtained with the improved Laplace method are very close to those based on the modified Laplace approximation of Shun (1997).
However, the improved Laplace method compared to Shun (1997) is easier to compute since it does not require log-posterior derivatives beyond the second-order.
Consider now the joint analysis of the Salamander data, by independently combining the three experiments' data.In this case the marginal likelihood entails the computation of three 40-dimensional integrals.To compare our method with other results available in the literature we consider a slightly modified version of the fixed effects.In particular, here β , where β R/R denotes the effect of the cross between a species R female and a species R male, and so on.The minimisation of the negative marginal likelihood approximated by the improved Laplace method, started at the same point as before, took 20 minutes and it converged after 23 iterations.
likelihood approach of Breslow & Clayton (1993), the standard Laplace approximation and the posterior mean taken with the Gibbs sampling proposed by Karim & Zeger (1992) are illustrated in Table 4.We notice that the standard Laplace approximation underestimates the variance parameters (see also Shun, 1997).The estimate of β and that of the variance parameters based on the proposed improved Laplace approximation are quite similar to those of the MC-EM procedure of Booth & Hobert (1999) (see also Sung & Geyer, 2007).Hence, the improved Laplace method gives approximate MLE that are quite similar to MC-EM, but compared to MC-EM the proposed method requires less intervention from the practitioner.

Final remarks
Laplace's method for integrals is still widely used in both Bayesian and frequentist applications.However, when the sample size is small or when the dimension of the integral is high, the second-order Laplace approximation may deliver inaccurate results or it may even be asymptotically not valid as with the Salamander data.In this paper we proposed an improved Laplace approximation, and showed its superiority with respect to the standard Laplace method.Moreover, the three applications considered here indicated that our method performs comparably well with other existing gold standard methods, which are computationally more demanding.
An advantage of the proposed method is that the scalar integrals can be easily run in parallel.Moreover, it can approximate accurately also high-dimensional integrals, provided h(•) satisfies the required assumptions.The main computational burden of our method is given by the constrained optimisations and Hessian determinants at the constrained optimum • to reproduce the plots in the Gompertz example, use the R code in the file gompertz.R; • to reproduce the plots in the t/skew-t example use the use the R code in the file mvtstud.R; • to reproduce the analysis with the BOD2 dataset use the R code in the file bod2.R; • to reproduce the analysis with the Lubricant dataset use the R code in the file lubricant.R; • to reproduce the analysis with the Salamander data use the R code in the file salamander.R.
Notice that the R code and the iLaplace package may require the installation of additional packages available on CRAN.

(
2005) use Laplace's formula within a Markov chain Monte Carlo (MCMC) framework, in which the nuisance parameters are integrated out by the Laplace method and the parameters of interest are integrated by MCMC methods.Furthermore, Ruli et al. (2014) propose a simulation algorithm which draws posterior samples by inverting the approximate cumulative distribution function based on the Laplace approximation for marginal posterior densities.

Figure 1 :
Figure 1: Asymptotic error of the improved and standard Laplace methods.Left panel: loglog plot of ( Î iL /I) (•) and ( Î L /I) (•) against n.Right panel: log-log plot of the relative error | Î iL /I − 1| (•) and | Î L /I − 1| (•) against n; the solid and dashed lines are the corresponding least-squares regression lines.The empirical slopes are -1.48 with 0.99 confidence interval

Figure 2 :
Figure 2: Standard (red colored symbols) and improved Laplace (black) approximations of The core functions needed to run the examples are implemented in C++ and are called from R via the Rcpp package.For the minimisations we use the function nlminb to which we supply the analytical gradient and Hessian.The nested optimisations are started at the global minimum and the numerical integrations are performed with the function integrate.

Table 1 ,
we notice that the standard Laplace approximation

Table 1 :
BOD2 data.Comparison of the improved Laplace method with adaptive numerical integration over hypercubes, the standard Laplace, the Bartlett-corrected Laplace, importance sampling (IS) and Chib & Jeliazkov's approximations.
Kass & Raftery, 1995rected version are quite inaccurate, since both lead to substantial false evidence for the normal model against the t-Student model (seeKass & Raftery, 1995the for the interpretation of the BF).More accurate approximations, such as adaptive integration, IS and Chib & Jeliazkov's method do not confirm such evidence.The result of the improved Laplace method is in reasonable accordance with IS, Chib & Jeliazkov's approximation and adaptive numerical integration.

Table 2 :
Lubricant viscosity data.Comparison of the improved Laplace method with the standard Laplace, the Bartlett-corrected Laplace, the importance sampling (IS) and the Chib & Jeliazkov's approximations.

Table 4 :
Salamander mating data analysed jointly.Comparisons of the improved Laplace approximation with the Monte Carlo Expectation-Maximisation (MC-EM) of Booth & Hobert