An Approximate Likelihood Perspective on ABC Methods

We are living in the big data era, as current technologies and networks allow for the easy and routine collection of data sets in different disciplines. Bayesian Statistics offers a flexible modeling approach which is attractive for describing the complexity of these datasets. These models often exhibit a likelihood function which is intractable due to the large sample size, high number of parameters, or functional complexity. Approximate Bayesian Computational (ABC) methods provides likelihood-free methods for performing statistical inferences with Bayesian models defined by intractable likelihood functions. The vastity of the literature on ABC methods created a need to review and relate all ABC approaches so that scientists can more readily understand and apply them for their own work. This article provides a unifying review, general representation, and classification of all ABC methods from the view of approximate likelihood theory. This clarifies how ABC methods can be characterized, related, combined, improved, and applied for future research. Possible future research in ABC is then suggested.


Introduction
Bayesian models are applied for statistical inference in many scientific fields. The posterior distribution is the main object of Bayesian Statistics and it is the result of the combination of two information sources, namely the prior distribution, which reflects extra-experimental knowledge, and the likelihood function, which formalizes the information provided by the data through the use of a given statistical model.
Posterior inferences with such models can be undertaken by applying classical Monte Carlo (MC) methods, which provide iterative algorithms that can generate approximate samples from the posterior distribution, without the marginal likelihood. They include Markov chain Monte Carlo (MCMC) methods, such as Metropolis-Hastings (MH), Gibbs, slice, and adaptive rejection sampling algorithms [e.g., 227,44]; and include population MC [PMC, e.g., 51], sequential MC [SMC, e.g., 66]), and other importance sampling (IS) methods [e.g., 166]. Alternatively, variational inference (VI) optimization methods can be applied to find a tractable density function that minimizes the divergence to the exact posterior density [30].
However, modern "big data" applications require complex models and demanding computational techniques; in some of these situations, MCMC or VI methods may be extremely slow or even impossible to implement. ABC methods are useful in the general scenario where for the given Bayesian model of interest for data analysis, the likelihood is not easily evaluated or intractable, but it is still possible to either draw samples from this likelihood conditionally on the model parameters [e.g., 98,277,216,23]; or to find a point-estimate of some model parameter function based on a sufficient statistic of the data [e.g., 253], on an empirical likelihood [185], or on a bootstrap method [e.g., 285].
Approximate Bayesian Computational (ABC) methods provide "likelihoodfree" methods for performing statistical inferences with Bayesian models defined by intractable likelihood functions.
The vastity of the literature on ABC methods created a need to review and relate all ABC approaches so that scientists can more readily understand and apply them for their own work. This article provides a unifying review, general representation, and classification of all ABC methods from the view of approximate likelihood theory. Currently, any ABC method (algorithm) can be categorized as either (1) rejection-, (2) kernel-, and (3) coupled ABC; and (4) synthetic-, (5) empirical-and (6) bootstrap-likelihood methods; and can be combined with classical MC or VI algorithms. However, given the vast ABC literature, all of these methods may appear very different to scientists. Further, all 22 reviews of ABC methods [179,209,177,21,28,62,114,225,242,174,263,35,176,248,12,226,80,150,165,72,243,78] have covered rejection and kernel ABC methods, but only three covered synthetic likelihood, one reviewed the empirical likelihood, and none have reviewed coupled ABC and bootstrap likelihood methods.
This article provides a unifying review, general representation, and classifications of all ABC methods, from the approximate likelihood theory view [79,Ch.24]. This clarifies how these methods can be characterized, related, combined, improved, and applied for future research.
Next, Section 2 begins to set specific ideas by describing some examples of Bayesian models defined by intractable likelihoods for which ABC is known to be successful, while explaining ways in which likelihoods can be intractable. Then, Section 3 presents, for all ABC methods, the general approximate likelihood representation to unify and classify them, and a general iterative IS (and MH) MC algorithm for sampling approximate posterior distributions. On this basis, Section 4 reviews the six types of ABC methods; Section 5 itemizes ABC methods that have been combined with either the classical MC, VI, or simulated annealing algorithms; Section 6 covers ABC model choice methods; and Section 7 summarizes the available statistical software packages for ABC. Section 8 describes some open problems in ABC. Section 9 concludes the article.

Examples of Bayesian Models with Intractable Likelihoods
A Bayesian model is defined by a parameter vector θ, having space Θ ⊆ R d , a likelihood measure f (y n | θ) for a given data set (y n = (y i ) n i=1 ) of n sample observations with sample space Y n , and by a prior distribution (measure) with density π(θ) defined on the parameter space, Θ ⊆ R d . The likelihood and prior densities are each a continuous p.d.f. and/or a discrete p.m.f., corresponding to c.d.f.s F (y n | θ) and Π(θ), respectively.
According to Bayes theorem, a set of data (y n ) updates the prior to a posterior distribution, defined by probability measure π(θ | y n ) = f (y n | θ)π(θ)/m(y n ), with marginal likelihood normalizing constant m(y n ) = f (y n | θ)Π(dθ). Also, f n (y) = f (y | θ)Π(dθ | y n ) is the posterior predictive density of a future observable y. However, for many Bayesian models, the likelihood f (y n | θ) is analytically and/or computationally intractable. Then, posterior inferences are infeasible using analytical, MCMC, SMC, PMC, IS, VI, or other appropriate methods.
We now proceed to describe three examples of Bayesian models defined by intractable likelihoods, each of which has a posterior distributions that can be estimated using ABC methods.

g-and-k Distribution
The first example of an intractable likelihood is defined by the g-and-k distribution [172]. This distribution extends the normal distribution by allowing for added skewness or heavier (lighter) tails, and can describe a wide variety distribution shapes with four interpretable parameters. The generalized g-and-k distribution [172] is defined by the quantile function: where z u = N −1 (u | 0, 1) is the standard normal N(0, 1) quantile function, and has parameters θ = (A, B, g, k). Here, A ∈ R is a location parameter, B > 0 is a scale parameter, g ≥ 0 controls skewness, k ≥ 0 controls kurtosis (tail size), and c = .8 provides a standard choice of overall asymmetry constant [224,171]. The likelihood p.d.f. of the g-and-k distribution has the form f (y n | θ) = n i=1 f (y i | θ), assuming that the given set of n sample observations y n = (y i ) n i=1 are i.i.d. from f (y | θ). A Bayesian g-and-k model is completed by the specification of a prior distribution π(θ) on the space of the model parameters θ = (A, B, g, k). However, the g-and-k likelihood p.d.f. has no closed-form expression in general. Instead, the likelihood p.d.f. is expressible in terms of derivatives of quantile functions, and needs to be computed completely numerically for each of the individual data points y i [224,211]. As a result, the computation of the g-and-k likelihood is slow even for a moderate data sample size n, and hundreds of time slower than computing the normal p.d.f. [224,211]. Then, for the inference from the posterior distribution π(θ | y n ) ∝ f (y n | θ)π(θ) of the Bayesian g-and-k model, any standard MCMC approach is computationally slow because it requires making n calls to numerical optimization to evaluate the likelihood p.d.f. f (y n | θ) in each MCMC sampling iteration [211]. ABC can be employed for performing inferences from the approximate posterior distribution of the parameters θ = (A, B, g, k), based on the specification of a surrogate likelihood that approximates the exact g-and-k model likelihood, but is less computationally costly. The rejection ABC (R-ABC) method (Section 4.1) has proven to be a viable ABC method for this model, which is based on finding parameter values which produce simulated (synthetic) data sets that are similar to the observed data sets y n , based on summary statistics of each data set [for details, see 211].

Mixed-Effects Model
The second example is given by the general mixed-effect model, which for the jth observation within blocking factor k, is given by: for j = 1, . . . , J k and k = 1, . . . , K, and n = K k=1 J k , where the x jk are the fixed-effect predictor vectors, assuming b k iid ∼ F ζ and ε jk iid ∼ F σ with the b k and the ε jk uncorrelated. The mixed-effects model (2.2) has corresponding likelihood: A Bayesian mixed-effects model is completed by the specification of a prior distribution π(θ) = π(β, ζ, σ), with corresponding posterior p.d.f. π(θ | y n ) ∝ f (y n | θ)π(θ). The likelihood (2.3) is intractable virtually impossible to evaluate unless F ζ and F σ are standard distributions such as normal distributions. But it is wellknown that empirical violations of these assumptions would cast doubt about the accuracy of the inferences from the posterior π(θ | y n ). In principle, any of the Approximate Bayesian Computational (ABC) methods (Section 4) can be implemented to perform posterior inferences of the Bayesian mixed-effects model with intractable likelihood.

Hidden Potts Model
The third example is provided by the Potts model, which originated in statistical physics, and is now widely used for applications in image processing, spatial modelling, computational biology, and computational neuroscience. To explain this model, let i ∈ {1, . . . , n} denote the pixels (nodes) of an image lattice, where for each node i, the value y i ∈ {1, . . . , k} denotes the node's state among k possible states. The Potts model is a Markov random field model defined in terms of its conditional probabilities: for i = 1, . . . , n, where θ ≥ 0 is the inverse temperature (scale) parameter, i ∼ ℓ are the neighboring pixels of i, and δ(·, ·) is the Kronecker delta function. For example, using a first-order neighborhood, i ∼ ℓ refer to the four pixels immediately adjacent to internal node i of the image lattice, and pixels on the image boundary have less than four neighbors. A Bayesian Potts model is completed by the specification of a prior distribution π(θ) on [0, ∞). According to Bayes' theorem, given the data y n = (y i ) n i=1 , the posterior distribution of the Potts model is given by π(θ | y n ) ∝ f (y n | θ)π(θ), with likelihood defined by: . (2.5) The Potts model likelihood (2.5) is intractable because its denominator involves a sum over all k n possible combinations of the labels y n ∈ ℑ. Clearly, the likelihood computation time, and the inference of the posterior distribution π(θ | y n ), depends on the size n of the image lattice. For the inference of the posterior distribution π(θ | y n ), for large n, an MCMC algorithm would be virtually impossible to implement because it would require evaluating the likelihood, and rejection ABC (R-ABC) is time consuming because simulating a (synthetic) data set from the likelihood (2.5) is computationally costly. However, the synthetic likelihood (SL-ABC) and bootstrap likelihood (BL-ABC) approaches to ABC have proven to be successful in providing inference from the posterior distribution π(θ | y n ) of the Bayesian Potts model with relatively low computational cost [see 188,285]. the general SL-ABC and BL-ABC methods are described in Section 4.

Approximate Likelihoods and Sampling Algorithm
We now introduce the general unifying representation of all ABC methods. Each ABC method provides approximate inference of the posterior distribution π(θ | y n ) = f (y n | θ)π(θ)/m(y n ) for a given Bayesian model defined by an intractable likelihood f (y n | θ), and by implication, an intractable marginal likelihood m(y n ).
Specifically, ABC provides tractable posterior inference by replacing the exact likelihood with an approximate likelihood that admits the general representation: Above, t(y n ) is a vector of summary statistics of the data y n (possibly, t(y n ) ≡ y n ), which ideally are of low dimension (dim) and sufficient as the given Bayesian model and data set permit; parameter identifiability requires dim(t) ≥ dim(θ).
) is the kernel density function, given a parameter estimate iid ∼ f (· | θ) of size n(k) from the exact model likelihood. ABC methods that use N = 1 set t(z n(k) ) ≡ t(z n ), and methods that sample no synthetic data (N = 0) set An unbiased estimator of the general approximate likelihood (3.1) is given by: For discrete data, the indicator function kernel provides an unbiased estimator of the exact model likelihood f (y n | θ) [234]. For realistic settings, the equality constraint is replaced by some kernel K in (3.2) that tolerates some small level of inequality.
The general likelihood (3.1) gives rise to the approximate posterior density: with m L (θ) = L η (y n | θ)Π(dθ). Theoretically, (3.3) exemplifies a limited information likelihood posterior when η θ ≡ θ and t(y n ) : = y n [214,71,284,149,142], and indirect inference [104,115,132] when η θ = θ [74]. Using the general approximate likelihood representation (3.1), Table 1 summarizes the six types of ABC methods proposed in the literature. They differ by the choice of kernel function (K), the number (N ) and method of sampling iid synthetic data sets from the exact model likelihood f (y n | θ), and by whether or not it implements indirect inference.
For any one of these six types of ABC methods, a general IS algorithm can be employed for ABC inference of the posterior π L (θ | y n ) in (3.3), for any function g(θ) of interest, using the prior π(θ) as the instrumental density. For the general inference of g(θ)π L (θ | y n )dθ, this integral can be rewritten as: and estimated via IS by employing a sampling scheme of the form: iid ∼ π(θ). The general IS algorithm is given by: The ABC-IS algorithm yields output of S samples of the model parameters and normalized sampling weights, (θ s , ω s = ω s / S j=1 ω j ) S s=1 , which can be used to construct estimates of the posterior π L (θ | y) for parameter functions of interest. The convergence of the output can be evaluated by the Effective Sample Size statistic, ESS = 1 S s=1 {ω s / S s=1 ω s } 2 , which ranges from 1 to S (perfect outcome where (θ s ) S s=1 are iid) [166]. The ABC-IS algorithm is conceptually simple, and can be easily parallelized because Step (a) draws independent samples from the prior π(θ) over iterations. Also, this algorithm can be applied to any one of the six types of ABC methods, with respect to specific choices of N (for Step (b)) and the kernel function K defining the importance sampling weight (for Step (c)), as summarized in second and third columns of Table 1. More details in Section 4. We want to stress that some users of the 6 different ABC methods may not at first immediately recognize the ABC-IS algorithm, but the importance sampling feature is indeed an integral part of these methods. For example, the popular, rejection ABC (R-ABC) method employs the same algorithmic Step (a) to generate a prior sample θ s ; and then samples N = 1 synthetic data set z (s) n(k) in Step (b) from the model likelihood f (y n | θ s ); and then in Step (c) employs an IS weight defined by a binary (0 or 1) valued function indicating whether the summarized data t(y n ) is close in distance to the summary t(z (s) n(k) ) of the sampled synthetic data set. Alternatively, an MH algorithm can be used (with weights ω s ≡ 1), by changing the ABC-IS algorithm so that step (a) draws θ * ∼ q(θ s | θ s−1 ) from a proposal density (distribution) q, and step (c) accepts θ s ≡ θ * with probability: based on the MH acceptance ratio. Either the ABC-IS algorithm or the Metropolis algorithm can be extended to versions (resp.) which adaptively find the optimal instrumental or proposal density (resp.) over iterations, with the aim of yielding Monte Carlo samples of θ that more quickly converge to samples from the posterior distribution π L (θ | y n ). This includes Population Monte Carlo (PMC) [51], adaptive multiple importance sampling [55], and adaptive Metropolis algorithms [e.g., 231], among others.
The following review of ABC methods is cast within the ABC-IS algorithm, for simplicity and without loss of generality, unless otherwise indicated.

ABC Methods
Sections 4.1-4.6 review the six types of ABC methods (Table 1). They each can be combined with other MC, VI, or simulated annealing algorithms, as mentioned in Section 5.

Rejection ABC (R-ABC)
The R-ABC method is the oldest [98,277,216,23]. R-ABC draws N = 1 synthetic data set z n ∼ f (· | θ) in ABC-IS step (b), and then in step (c), it accepts the sample θ s when the distance ρ t(yn),t(zn) between t(y n ) and t(z n ) is within a chosen tolerance ǫ ≥ 0, thereby using the IS weight ω s ≡ 1(ρ t(yn),t(zn) ≤ ǫ) with indicator (kernel) function 1(·). The sampling algorithm is run until the desired number of acceptances is obtained. Actually, the first R-ABC approach [253] did not employ a sampling step, but instead based rejection decisions on a maximum likelihood estimate θ = arg max θ∈Θ f t (t(y n )|θ) using a sufficient summary statistic t(y n ) of the data. However, this early method is limited to relatively simple Bayesian modeling scenarios in which f t (t(y n )|θ) can readily be computed and maximized over θ, whereas this limitation can be avoided by employing the sampling step [23, pp.2025-6].
R-ABC approximates the exact model likelihood f (y | θ) by the proportion of synthetic data sets that are similar to the observed data [69]. For discrete data y n it may be natural to use no tolerance (ǫ = 0) and summary statistics (t(y n ) ≡ y n , t(z n ) ≡ z n ) [253]. For continuous data, some tolerance and summary statistics are more useful [98]. When t is sufficient, the R-ABC posterior density function . Then, π(θ | y n ) = π(θ | t(y n )) for all θ [144,154]. Hence the tolerance ǫ represents model error [280]. Sampling only one (N = 1) synthetic data set per IS iteration provides a good trade-off between computational and posterior estimation efficiency. This is also true for kernel ABC (Section 4.2) and R-ABC-MCMC (Section 5) [38].
decrease or eliminate tuning parameter dependence, as described next in Sections 4.2-4.6.
For K-ABC and R-ABC, if the summary statistic(s) obeys a central limit theorem, then the posterior mean estimator of any parameter function can be asymptotically normal (as n → ∞), with mean square error equal to that of the maximum likelihood estimator based on the summary statistic(s). Further, for kernel ABC, this estimator is efficient in the sense that the MC error of ABC increases its mean square error only by a factor of 1 + O(1/S) for a fixed MC sample size S [159,65].

Synthetic Likelihood (SL) Methods
A SL method generates multiple (N ≥ 1) samples of synthetic data sets per sampling iteration. The classical SL method assumes the approximate likelihood to be the multivariate normal p.d.f. n(· | µ, Σ), with η {t(z n(k) )} θ,N = ( µ θ,N , Σ θ,N ) the maximum likelihood estimate (MLE) obtained from {t(z n(k) )} N k=1 , assuming that the summary statistics t are asymptotically normal [283,121]. In terms of the ABC-IS algorithm, iterative step (a) generates a prior sample θ s ∼ π(θ); θ,N ) from N sampled synthetic data sets, and step (c) weights θ s by ω s ≡ n(t(y n ) | µ Several other SL methods relax the normality assumption, including those that replace the normal SL by: a GARCH model with time-dependent normal heteroscedastic variances [99]; a finite mixture of normal experts model with covariate-dependent kernel densities and mixture weights [88]; a Gaussian process (GP) for each individual (marginal) summary statistic, assuming independent summary statistics [184,126]; a GP for the log SL [281], estimated by sequential history matching [58]; a GP for the discrepancy between observed and simulated synthetic data [111], estimated by Bayesian optimization [136]; and other general SLs [99,74,77,111].

Empirical Likelihood Method (EL-ABC)
EL-ABC specifies the kernel function by the empirical likelihood (EL), , and h(y, θ) = y − θ, then the constraint is θ = n i=1 p i y i . The EL (4.4) assumes n i.i.d. observations, but it can handle dependent observations by reformulating it as a dynamic regression model that captures underlying iid structure. In terms of the ABC-IS algorithm, EL-ABC in iterative step (a) draws a prior sample θ s ∼ π(θ); in step (b) finds the EL L el n (y n | θ s ); and in step (c) assigns the IS weight ω s ≡ L el n (y n | θ s ).

Bootstrap Likelihood Method (BL-ABC)
BL-ABC [285] first estimates the exact model likelihood by a kernel density estimate L η (y n | θ) ≡ L bl n (y n | θ) of nested bootstrap samples of a point-estimator θ of θ, from the original data y n [64]. The nested bootstrap has two stages. At the first stage, J bootstrap samples of data sets {y where ker(·) is a smooth (e.g., Epanechnikov) kernel with bandwidth h > 0. Then the bootstrap likelihood (BL) is constructed by fitting a scatterplot smoother to the J pairs {( θ * j , log f ( θ n | θ * j ))} J j=1 . Here, θ n = θ(y n ), and f ( θ n | θ * j ) provides an estimate of the likelihood of θ n given θ ≡ θ * j [64]. For BL-ABC, the ABC-IS algorithm is run with step (a) drawing a prior sample θ s ∼ π(θ); step (b) is skipped; and step (c) calculates the importance sampling weight by ω s ≡ L n (y n | θ s ). The empirical and bootstrap likelihoods asymptotically agree to order n −1/2 , and converge to the true model likelihood f (y | θ) as n → ∞ [64].
BL-ABC can be extended in a few ways, owing to the general nature of the bootstrap method. First, if θ − θ is a pivotal quantity such that θ − θ ∼ H, with H not involving θ, then the bootstrap likelihood estimate L bl n (y n | θ) can be more simply constructed by the ordinary single bootstrap [37]. Second, BL-ABC can be easily extended to handle non-iid dependent data through the use of the regression residual, parametric, or the pairs bootstrap [285]. Third, BL-ABC can incorporate R-ABC, but at a higher computational cost [285]. Fourth, if the model parameter is scalar-valued (θ ≡ θ), BL-ABC can be used to construct an empirical likelihood using a single bootstrap, without kernel density estimation [198]. Finally, BL-ABC can been extended to "n choose m" or m/n bootstrap (m ≤ n) sampling per iteration [161].  11,189,123] typically provide automatic tolerance (ǫ) selection. The variance bounds and geometric ergodicity of R-ABC-MCMC were studied in relation to that of the MH algorithm for models with intractable likelihood [153]. Also, VI including expected-propogation (EP) methods were proposed to define K-ABC-EP, R-ABC-VI and K-ABC-VI methods [15,16,259], and simulated annealing (SA) was proposed to define a K-ABC-SA method [5].
Random forests [42] was proposed to estimate the MH acceptance ratio as a function of simulated summary statistics [205], based on bootstrap aggregation [41] of the optimal classification predictions of many Classification and Regression Trees [CARTs; 43] fitted over resamples of the data y n . Classical SL's [283] normal synthetic likelihood amounts to assuming the quadratic discriminant analysis classifier.

ABC for Model Choice
Model choice aims to find the model from a set of D considered models {M d } D d=1 that has the best predictive utility for the underlying process that generated the given data set, y n . There are several ABC methods for model choice [216].
One method for R-ABC or K-ABC estimates the posterior model probabilities from a multinomial logit regression of model indices on summary statistics t(z n ) sampled from an R-ABC algorithm, locally-weighted and conditionally on t(x n ) [86, 20,33,84,212]. A method for R-ABC [97] approximates each model's deviance information criterion [246] by using kernel density estimation (tolerance ǫ bandwidth) of the deviance statistic, obtained from summary statistics of posterior predictive samples generated per sampling iteration. Another R-ABC method [143] employs an MH algorithm that proposes jumps between models, based on the pseudo-marginal approach that handles intractable likelihoods [9]. A general ABC model choice method employs random forests [217], as follows. First, a prior probability π(M d ) is assigned to each model M d with intractable likelihood f d (z n | θ d ) and prior π d (θ d ), for a given set {M d } D d=1 of compared models. Then a table is constructed by taking N ref samples of (d, t(z n )) from π(M d )π d (θ d )f d (z n | θ d ). Next, the method fits a CART T b that classifies model indices with t(z n ), to each of B bootstrap samples (with replacement) of size N boot < N ref from the table. Finally, the single best predictive model from is identified as the one with the model index that receives the most votes (optimal classifications) from the B fitted CARTs.
Finally, R-ABC, R-ABC-MCMC (MH), and R-ABC-SMC algorithms can be extended for assessing the fit of a single model to data [221,219,228,220].

Software Packages for ABC
There are at least 28 statistical software packages that support many of the ABC methods that were mentioned earlier [in part, from 195, 145]. Table 2 provides a summary of these packages. Nearly all packages are based on R-ABC, while a few recent packages handle K-ABC or SL-ABC. Also, 9 of these packages can be applied to general models, while all the other packages address specific models, scientific fields, or focuses on model selection tasks. According to Table 2 and the review of applications given in Section 1, R-ABC seems to be most widely applied in the population genetics, phylogeography, systems biology, archaeology, and cosmology fields. However, in the future it is expected that fields will more frequently apply the more recent ABC methods since they have some advantages over R-ABC.

Open Problems in ABC
The six types of ABC methods have proven useful for many applications of data analysis. However, they are not fully satisfactory for reasons that are summarized in Table 3. Table 3 defines the current open problems with ABC, described in the following subsections. of function of θ estimation resampling.
No Table 3 The open problems in ABC, including the computational complexity of each ABC method.
The table conveys that several aspects can together contribute to the computational complexity of each ABC method, such as the sample size or the number of model parameters or variables. However, these issues may potentially be less important as increasing computing power continues to become more available [248].
A typical choice of ρ is the squared distance. But posterior inferences differ across other reasonable choices of distance measures, and there does not seem to be one "best" distance measure for general ABC practice. K-ABC employs a smooth kernel function K δ (ρ) of ρ. The kernel bandwidth needs to be carefully chosen, perhaps by matching the resulting posterior credible intervals with the correct coverage levels [213].
When employed, summary statistics t should be sufficient. But sufficient statistics are not available from many Bayesian models and data sets. Insufficient summary statistics when used should be carefully selected to ensure accurate posterior inferences [e.g., 68,175,230]. Many statistics selection methods have been developed [137,275,194,93,3,35,102,133,236,59,187].
In R-ABC, the tolerance ǫ represents model approximation error [280] and controls the trade-off between computational speed and estimation accuracy. Running a R-ABC algorithm with overly-small ǫ requires many model evaluations due to the large rejection probability, making it time-consuming to get a large number of acceptances. When ǫ is too large, the algorithm poorly approximates and overstates uncertainty in the posterior distribution. Several proposed solutions include selecting the tolerance from the MC sampling run [241,24,258,67,240,156,52] or from posterior asymptotics [13]; and running a R-ABC algorithm for each of the n components of a factorized exact model likelihood via the Markov property (if factorizable), possibly with lower tolerance and computational cost and without (ρ, t) [279,15,18].
In SL-ABC, N controls the trade-off between computational speed and estimation accuracy [e.g., 77], and inferences may be highly-sensitive to N when the SL is not smooth [111,Section 4]. Recent extensions of SL-ABC can decrease this sensitivity [215]. Some issues are avoided by employing ABC methods that do not rely on all tuning parameters (ρ, t, ǫ). K-ABC and C-ABC do not use ǫ and may not require t, and SL-ABC does not use (ρ, ǫ), but they depend on other tuning parameters. C-ABC requires solving the inverse problem, which is not possible for all model likelihoods. For R-ABC, a method can select (ρ, t, ǫ) [223]. The factorization method does not require (ρ, t).
In summary, there are many proposed solutions to address tuning parameter dependence in ABC, but none appear to be neat and fully satisfactory.
EL-ABC and BL-ABC do not employ any tuning parameters. But they still require a point estimator for p(θ) and θ (resp.), which may not be available for the given Bayesian model at hand. In EL-ABC it can be difficult to specify the moment constraints needed to estimate p(θ). See Sections 8.2-8.3 for related issues.

The Curse of Dimensionality
The curse of dimensionality [25] can pose challenges to the tuning parameters. In R-ABC, K-ABC, and C-ABC, squared distance is a typical choice for ρ, but it suffers from the curse beyond three dimensions [e.g., 119], perhaps also true for general distance measures of summary statistics [35]. Even when sufficient statistics are available, they often have the same dimensionality as the sample size. For instance, while dim(t) ≥ dim(θ) is required for parameter identifiability, this implies that high-dimensional summary statistics t are required to adequately describe the information of a high-dimensional θ. It is not uncommon in ABC practice that dim(t) ≫ dim(θ) [6,39]. Otherwise, low-dimensional sufficient statistics may be computationally intractable.
Proposed solutions include: performing a (linear, local, or nonlinear) regression of accepted proposed parameter values onto the distance between the simulated and observed summary statistics, and then estimating the posterior distribution by sampling the regression error distribution conditionally on zero distance [23, 256,31,32,193,111]; performing regression adjustments on lowdimensional marginal posterior distributions and summary statistics [193,160], since low-dimensional posteriors can often be estimated well; methods to reduce the dimensionality of the summary statistics [35]; and the factorized likelihood method using no tuning parameters (ρ, t) [279].
A high-dimensional model parameter θ also affects EL-ABC and BL-ABC by making point estimation of p(θ) and θ (resp.) more challenging and time consuming, especially when the data sample size (n) is large. Then, in EL-ABC the constraints for p(θ) are difficult to specify [285], and in BL-ABC a large number of bootstrap samples are needed for reliable kernel estimation of the approximate likelihood. Low-dimensional summary statistics may help.
The ABC-PaSS algorithm [146] (i.e., ABC for Parameter Specific Statistics ABC) was designed to tackle the problem of high-dimensionality that arises frequently in inference for complex models, which are typical in biology and population genetics. This approach is based on Gibbs sampling, where each parameter in the model is updated independently, and this update is accepted or rejected based on the Euclidean distance of parameter-specific statistics to the observed data.

Large Sample Size (n) Issues
For R-ABC, K-ABC, and SL-ABC, simulating N ≥ 1 synthetic data sets of size n per sampling iteration can be computationally costly, especially when the data sample size n is large. For certain parameter space regions, a small number of simulations N may suffice to conclude that the approximate likelihood cannot take on a significant value. Proposed solutions to such issues include an extension of SL-ABC that uses Bayesian optimization for estimating SL parameters instead of simulating synthetic data [111]; and an R-ABC IS algorithm that saves time by sometimes abandoning the simulation of synthetic data sets that likely poorly match the observed data [210].
Approximate ABC (AABC) can provide a faster R-ABC method to simulate synthetic data sets in settings where it is too computationally costly to directly simulate from the exact model likelihood [46]. AABC initializes the ABC-IS algorithm by drawing a small number N * of prior samples {θ * } N * k=1 iid ∼ π(θ) and synthetic data sets Then after drawing θ s in iterative step (a), AABC in iterative step (b) draws φ from a Dirichlet distribution with precision parameters given by the Epanechnikov kernel weights of {θ * } N * k=1 centered at θ s , and samples a synthetic data set z n = (z i ) n i=1 by drawing z i = z * j with probability φ j for i = 1, . . . , n.
Large sample size (n) also slows point-estimation. This seems to be true for SL-ABC extensions that estimate a GP per sampling iteration because this can involve multiple inversions of n×n matrices, and true for EL-ABC and BL-ABC which estimate p(θ) and θ (resp.) in each iteration.

ABC Model Choice Issues
All current ABC model choice methods employ summary statistics t, and also a tolerance ǫ > 0 in some cases. Coherent R-ABC model choice requires sufficient summary statistics [109,68,230,45,175,48]. The results of ABC model choice can be inconsistent when insufficient summary statistics are employed [230], and can depend heavily on the information amount in the given data set [247]. But as mentioned, sufficiency is not provided by many models and data sets, and tolerance selection is not trivial.
Nearly all of the ABC model choice methods assign a prior distribution π(M d ) on the model space. This implicitly assumes the M-closed view of model choice, namely that the true model that generated the data y n is in the set of models {M d } D d=1 under consideration [27]. This view is not uncontroversial because the true model is often unavailable in practice, which anyway may not accurately predict future data given the available evidence y n [266]. Alternatively, an M-open view may be adopted, avoiding the explicit construction of an actual true belief model. In this case, model choice can proceed by comparing models by posterior predictive performance of future data. This can be done under minimal modeling assumptions by employing sample re-use cross-validation methods to approximate the future data distribution.

Conclusions
ABC provides useful inferential methods for Bayesian models with intractable likelihoods. On the basis of approximate likelihood theory, this article provided a unifying review of ABC methods in the huge related literature. This allowed for the methods to be concisely described, classified, and directly compared, with the aim of promoting future use of ABC methods among scientists. The review also informed a summary of some of the current open problems with ABC methods. The review as a whole suggests some future research directions in ABC, possibly by combining some of the virtues of K-ABC, SL-ABC, and BL-ABC, while avoiding some of their issues.