acebayes: An R Package for Bayesian Optimal Design of Experiments via Approximate Coordinate Exchange

We describe the R package acebayes and demonstrate its use to find Bayesian optimal experimental designs. A decision-theoretic approach is adopted, with the optimal design maximising an expected utility. Finding Bayesian optimal designs for realistic problems is challenging, as the expected utility is typically intractable and the design space may be high-dimensional. The package implements the approximate coordinate exchange algorithm to optimise (an approximation to) the expected utility via a sequence of conditional one-dimensional optimisation steps. At each step, a Gaussian process regression model is used to approximate, and subsequently optimise, the expected utility as the function of a single design coordinate (the value taken by one controllable variable for one run of the experiment). In addition to functions for bespoke design problems with user-defined utility functions, acebayes provides functions tailored to finding designs for common generalised linear and nonlinear models. The package provides a step-change in the complexity of problems that can be addressed, enabling designs to be found for much larger numbers of variables and runs than previously possible. We provide tutorials on the application of the methodology for four illustrative examples of varying complexity where designs are found for the goals of parameter estimation, model selection and prediction. These examples demonstrate previously unseen functionality of acebayes.


Introduction
A well-planned and executed experiment is an efficient and effective way of learning the effect of an intervention on a process or system (Box et al. 2005), and design of experiments is a key contribution of statistics to the scientific method (Stigler 2016, ch. 6). Statistical design is an "a priori" activity, taking place before data is collected, and so fits naturally within a Bayesian framework. Before experimentation, current knowledge on models and parameters can be represented by prior probability distributions, and the experimental aim (e.g., parameter estimation, model selection, or prediction) can be incorporated into a decision-theoretic approach through the specification of a utility function (Berger 1985, ch. 2). A Bayesian optimal design is then found by maximising the expectation of this utility over the space of all possible designs, where expectation is with respect to the joint distribution of all unknown quantities including the, as yet, unobserved responses (Chaloner and Verdinelli 1995).
To formalise, suppose the aim of an experiment that varies k variables in n runs is to estimate or identify quantities γ = (γ 1 , . . . , γ p ) , which, for example, may be (functions of) parameters, model indicators or future responses. We perform this task using data y = (y 1 , . . . , y n ) ∈ Y ⊂ R n collected using a design d ∈ D ⊂ R n×k , an n × k matrix with ith row x i = (x i1 , . . . , x ik ) holding the treatment, or combination of values of the controllable variables, assigned to the ith run of the experiment (i = 1, . . . , n). We refer to nk as the dimensionality of the design and the x ij as coordinates of the design (j = 1, . . . , k).
A decision-theoretic Bayesian optimal design d maximises the expected utility u(γ, y, d)π(γ, y|d) dγ dy (1) = u(γ, y, d)π(γ|y, d)π(y|d) dγ dy (2) = u(γ, y, d)π(y|γ, d)π(γ|d) dγ dy , with utility function u(γ, y, d) providing a measure of success of design d for quantities γ and data y, and π(·|·) denoting a conditional probability density or mass function. The density/mass π(γ|d) quantifies prior information about γ available before the experiment is run. The equivalence of Equations 2 and 3 follows from application of Bayes theorem; Equation 2 more clearly shows the dependency on the posterior distribution, whereas Equation 3 is often more useful for calculations and computation.
Although straightforward in principle, there are several hurdles to the practical evaluation and optimisation of Equations 2 or 3, highlighted in recent reviews by Ryan et al. (2016) and Woods et al. (2017).
1. The expected utility often results from an analytically intractable and, typically, highdimensional integral. For such utilities, U (d) may be approximated by a weighted sum for w b > 0. For example, a Monte Carlo approximation would sample {γ b , y b } B b=1 from the joint distribution π(γ, y|d) and set w b = 1/B; maximisation ofŨ (d) would be a stochastic optimisation problem. For simple utility functions not depending on data y, a deterministic quadrature approximation may be applied, with γ b and w b being quadrature abscissae and weights, respectively.
2. The design space may be continuous and of high dimension.
3. The utility function itself may not be available in closed form, requiring an approximationũ(γ, y, d) to be substituted into Equation 4.
Most Bayesian optimal design methodology in the literature has been restricted to lowdimensional designs, i.e., small values of nk. See Müller and Parmigiani (1995), Müller (1999), Amzal et al. (2006), Long et al. (2013), and Drovandi et al. ( , 2014, where the largest designs found had nk = 4. To find designs with larger n for k = 1 variable, Ryan et al. (2014) chose design points as quantiles of a parametric distribution. Although such an approach reduces the dimension of the optimisation problem (e.g., to finding the optimal values of a small number of parameters controlling the distribution), the original optimal design problem is not being directly addressed and hence usually sub-optimal designs will be found. Overstall and Woods (2017) recently presented the first general methodology for finding highdimensional Bayesian optimal designs using the approximate coordinate exchange (ACE) algorithm. As will be described in Section 2.1, the main feature of this approach is the combination of the low-dimensional smoothing methodology of Müller and Parmigiani (1995) with a coordinate exchange, or cyclic ascent, algorithm (Meyer and Nachtsheim 1995;Lange 2013, p. 171). In essence, the high-dimensional, computationally expensive and often stochastic optimisation problem is reduced to a sequence of one-dimensional, computationally cheaper, and deterministic optimisations. In this paper, we describe the R package acebayes (Overstall et al. 2018c) which implements the ACE algorithm, and introduce functionality that facilitates finding optimal designs for common classes of models, including nonlinear and generalised linear models. The package provides the first general-purpose software for finding fully-Bayesian optimal designs. In addition, the package implements methods to find pseudo-Bayesian designs (see Section 3.2) using coordinate exchange and quadrature approximations (see Gotwalt et al. 2009). The package has been demonstrated by finding Bayesian optimal designs for non-trivial statistical models, addressing design spaces with dimensionality approaching two orders of magnitude greater than existing methods.
There are only a few other R packages that attempt to find optimal designs, and none that tackle the general Bayesian design problem addressed by acebayes. The package AlgDesign (Wheeler 2014) implements exchange-type algorithms to find D-, A-and I-optimal designs for linear models. The package OptimalDesign (Harman and Filova 2016) tackles similar linear model design problems using various algorithms including integer quadratic programming. For nonlinear models, package ICAOD (Masoudi et al. 2018) can be used to find designs for quite general classes of models under "optimum-on-average" criteria, amongst others. Such criteria are mathematically equivalent to "pseudo-Bayesian" design criteria, see Section 3. This package uses the meta-heuristic imperialist competitive algorithm (Masoudi et al. 2017). For special classes of nonlinear models, locally optimal designs, with a point mass prior for γ and utility functions not depending on y, can be found by packages LDOD (Masoudi et al. 2013), designGLMM (Bush and Ruggiero 2016) and PopED (Nyberg et al. 2012).
This paper is structured as follows. We briefly describe both the ACE algorithm and its implementation in acebayes in Section 2. Section 3 presents common utility functions em-ployed in Bayesian design, and discusses their computational approximation. In Section 4 we demonstrate the use of various functions in the acebayes package to find optimal Bayesian designs for common nonlinear and generalised linear models, and bespoke model selection and prediction problems. We conclude in Section 5 with a short discussion.

Approximate coordinate exchange and acebayes
In this section we give a brief description of the ACE algorithm. Full details of the methodology can be found in Overstall and Woods (2017).
The algorithm has two phases, both of which are provided in full in Appendix A. In Phase I a smooth, and computationally inexpensive emulator for the approximationŨ (d) in Equation 4 is maximised as a function of each design coordinate x ij in turn, conditional on the values of the other nk − 1 coordinates. In essence, the optimisation problem is solved via a sequence of computer experiments (see Santner et al. 2003).
For a stochastic approximation to the expected utility (e.g., Monte Carlo integration), the coordinate value that maximises each emulator is accepted with probability obtained from a Bayesian test of equality of the approximations under the proposed and current designs.
For a deterministic approximation (e.g., quadrature), the design proposed by the emulator is accepted if the value ofŨ (d) for the proposed design is larger than for the current design.
As Phase I tends to produce clusters of design points, Phase II of the algorithm can be applied to attempt to amalgamate these clusters through use of a point exchange algorithm with a candidate set formed from the points in the final Phase I design.

ACE algorithm
Phase I of the algorithm (Appendix A.1) uses cyclic ascent to maximise approximationŨ (d) to the expected utility. A one-dimensional emulator ofŨ (d) is built for each coordinate x ij (i = 1, . . . , n; j = 1, . . . , k) in turn as the mean of the posterior predictive distribution conditioned on a small number of evaluations ofŨ (d) and assuming a Gaussian process (GP) prior (see Rasmussen and Williams 2006, ch. 2). For the ijth coordinate, an emulator is constructed by (i) selecting a one-dimensional space-filling design, x 1 ij , . . . , x Q ij , with Q points; (ii) constructing the Q designs d q ij , with the qth design having ith run x q i = (x i1 , . . . , x ij−1 , x q ij , x ij+1 , . . . , x ik ) and all other runs equal to those from the current design; (iii) evaluatingŨ (d q ij ) for q = 1, . . . , Q; and (iv) fitting a GP regression model to the "data" {x q ij ,Ũ (d q ij )} Q q=1 and constructing an emulatorÛ ij (x) as the mean of the posterior predictive distribution. Maximisation of U ij (x) to obtain x ij is via evaluation of the emulator for a large discrete grid of values of x ij to produce design d ij with ith row (x i ) = (x i1 , . . . , x ij−1 , x ij , x ij+1 , . . . , x ik ). Overstall and Woods (2017) found this approach to maximisingÛ ij (x) to be robust to multi-modal emulators and computationally efficient due to the negligible computational expense of evaluating the predictive mean.
IfŨ (d) is a Monte Carlo approximation, it is subject to two sources of potential errors: Monte Carlo error and emulator inadequacy. To separate these components and reduce the impact of a poor emulator, d ij is only accepted as the next design in the algorithm with probability p obtained from a Bayesian hypothesis test, independent of the GP emulator (see Step 2d in Appendix A.1). Here p is calculated as the posterior probability that the expected utility for the proposed design d ij is greater than that for the current design, given independent Monte Carlo samples of the utility under each design and assuming normally distributed utility values. For cases where this latter assumption is violated, Overstall et al. (2018a) developed an alternative procedure derived from a one-sided test of a difference in proportions appropriate for use with, for example, a 0-1 utility function (see Equation 8 in Section 3.1). Larger Monte Carlo sample sizes are typically used for these tests than for the construction of the emulator to increase the precision of approximationŨ (d). Both tests are implemented in acebayes.
For a deterministicŨ (d), design d ij is accepted if its approximate expected utility, evaluated independently of the emulator, is greater than that of the current design.
Phase I can produce clusters of design points where, for example, design points x i and x i are separated by only a small Euclidean distance for some i, i = 1, . . . , n. Often, the design can be improved by consolidating such points into a single repeated design point (see also Gotwalt et al. 2009). Phase II of ACE (Appendix A.2) performs this consolidation step using a point exchange algorithm (e.g., Atkinson et al. 2007, ch. 12) with a candidate set given by the final design from Phase I. For Monte Carlo approximations to the expected utility, comparison of the approximate expected utility between two designs is again made on the basis of a Bayesian hypothesis test (see Step 6 of Appendix A.2).
In both phases, convergence is assessed informally using trace plots of the evaluations of approximate expected utility at each iteration. See Section 4.2 for an example of such a plot produced by acebayes.
Similar to all coordinate exchange algorithms (e.g., Goos and Jones 2011, pg. 36), ACE can be sensitive to the starting design. For this reason, it should be repeated from C different starting designs and acebayes provides methods to facilitate this repetition, potentially via simple parallel computing.

acebayes implementation of ACE
The main functions in the acebayes package are ace and pace, which implement both phases of the ACE algorithm and have mandatory and optional arguments as given in Tables 1 and 2, respectively. The ace function implements the ACE algorithm from a single starting design, whereas pace repeats ACE from C different starting designs. The argument utility gives the user complete flexibility to specify the design problem including the choice of statistical model, prior distribution, experimental aim and any necessary approximation to the utility function (see Section 3).
Much of the acebayes codebase is written in C++ and makes use of packages Rcpp (Eddelbuettel and Francois 2011) and RcppArmadillo (Eddelbuettel and Sanderson 2014). Space-filling designs to build the one-dimensional GP emulators are found using the R package lhs (Carnell 2016) which generates Latin hypercube samples.
To demonstrate the use of the ace and pace functions we use a simple Poisson response model for estimating a single parameter γ = (θ). Consider an experiment where the ith run involves specifying the k = 1 variable x i ∈ [−1, 1] and measuring the count response y i . Assume the following model; independently, for i = 1, . . . , n, with µ i = exp(x i θ). We assume a priori that θ ∼ N (0, 1). We

Argument Description utility
A function with two arguments: d and B. For a Monte Carlo approximation (deterministic = FALSE), it should return a vector of length B where each element gives an evaluation of the (approximate) utility functionũ(γ b , y b , d) for design d for each pair (γ b , y b ) generated from the joint distribution of γ and y for b = 1, . . . , B.
For a deterministic approximation (deterministic = TRUE), it should return a scalar giving the approximate value of the expected utility for design d. In this latter case, the argument B can be a list containing tuning parameters for the deterministic approximation. If B is not required, the utility function must still accept the argument, e.g., using the ... notation. start.d For ace: an n × k matrix specifying the starting design for Phase I (see Step 1 in Appendix A.1).
For pace: a list of C different starting designs. find a design that maximises the expectation of the following utility function is the Fisher information and d = (x 1 , . . . , x n ) . As u does not depend on y, the expected utility reduces to where π(θ) is the density of the standard normal distribution. It is straightforward to show that x 2 i exp x 2 i /2 , and the optimal design is d * = (±1, . . . , ±1) . However to demonstrate the use of the ace and pace functions, we employ a Monte Carlo approximation to U (d) in Equation 5. This approximation is implemented in the R function below which takes two arguments: an n × 1 matrix d and the Monte Carlo sample size B. It returns a vector of length B, where each element is an evaluation of u(θ, d) for a value of θ generated from the prior distribution. For a deterministic approximation (deterministic = TRUE), B may be a list of length two containing any necessary tuning parameters for the utility calculations for the comparison and emulation steps. Q The number, Q, of evaluations of the approximation to the expected utility function (1) used to construct the GP emulator. The default is Q = 20. N1 The number, N 1 , of iterations of Phase I. The default is N1 = 20. N2 The number, N 2 , of iterations of Phase II. The default is N2 = 100. lower Lower limits on the design space. It can be a scalar (all elements have the same lower limit) or an n × k matrix so that all elements can have unique lower limits.

mc.cores
For pace only. The number of cores to use, i.e. at most how many child processes will be run simultaneously. The default is mc.cores = 1. n.assess For pace only. If deterministic = FALSE, the approximate expected utility for the C final designs will be calculated as the mean of n.assess approximations to the expected utility for each design. We now call the function ace to demonstrate finding a design with n = 12 runs from a single starting design. The first mandatory argument utility is set to be the function defined above. The second mandatory argument start.d specifies the starting design; note that although k = 1, the starting design still needs to be an R matrix object. Here we use a matrix of n zeros. We keep all other arguments as their default values and set a seed for reproducibility.

R> ex22a
User-defined model & utility Number of runs = 12 Number of factors = 1

Number of Phase I iterations = 20
Number of Phase II iterations = 100 Computer time = 00:00:33 An "ace" object is a list which includes the final designs from Phase 1 (phase1.d) and Phase 2 (phase2.d) of the algorithm. If N1 = 0 (i.e., there are no Phase I iterations), then phase1.d will be equal to the argument start.d. Correspondingly, if N2 = 0 (i.e., there are no Phase II iterations), then phase2.d will be equal to phase1.d.
Consider now repeating the above implementation of ACE from C = 10 different randomly generated starting designs, where each element in the design is generated uniformly from [−1, 1]. The starting designs are organised into a list and we then call the function pace with the same utility argument as the call to ace above.

R> ex22b
User Computer time = 00:05:41 A pace object is a list which includes the Phase II design from each repetition (final.d) and, from those, the design (d) found with the largest approximate expected utility.
To compare two designs, acebayes provides the the S3 method assess. It takes two mandatory arguments: d1 and d2, specifying the two designs to be compared. The argument d1 should be either a ace or pace object and the two designs will be compared on the basis of the expected utility used for d1. The argument d2 should either be a ace, pace or matrix object. We use assess to compare the single starting design of n zeros, to the final designs from a single repetition of ACE and from C repetitions. In cases like this, where the approximation to the expected utility is not deterministic, assess will calculate n.assess approximations to the expected utility where n.assess is an optional argument. We set n.assess to be 100. The function assess will return an object of type assess. For a non-deterministic approximation to the expected utility, when printed, an assess object will show the mean and standard deviation of the n.assess evaluations of the approximate expected utility for each design.
R> assess(d1 = ex22a, d2 = matrix(rep(0, n), ncol = 1), n.assess = 100) We can clearly see the improvement in approximate expected utility from the design of n zeros and the designs found by ACE. It appears that, in this case, the C repetitions of ACE have not led to any improvement in the design performance. This can be seen by plotting the assess object providing side-by-side boxplots of the n.assess evaluations of the approximate expected utility for each design. R> assess22 <-assess(d1 = ex22a, d2 = ex22b, n.assess = 100) R> plot(assess22) The resulting plot is shown in Figure 1 from which we can see that the performance of the designs are very close. This can be confirmed by inspecting the two designs and noting that they both consist only of values x ± 1, i.e. both are optimal.

Utility functions and approximations
Prior to application of the ACE algorithm, a relevant, and perhaps pragmatic, choice of utility function must be made that encapsulates the aim of the experiment. In Section 4, we illustrate use of functions from acebayes by finding efficient designs under three common utility functions.
A SIG-optimal design that maximises the expectation of u SIG equivalently maximises the expected Kullback-Liebler divergence between the prior and posterior distributions (Chaloner and Verdinelli 1995).
2. Negative squared error loss (NSEL; e.g., Chaloner 1984): A NSEL-optimal design that maximises the expectation of u N SEL equivalently minimises the expected trace of the posterior variance matrix. Note that the use of this utility is not appropriate for nominal or ordinal γ (for example, if γ holds binary model indicator variables).
3. 0-1 utility (e.g., Felsenstein 1992): where M l (y, d) = arg max γ l π(γ l |y, d) is the marginal posterior mode of γ l , I(A) is the indicator function for event A, and δ l ≥ 0 is a specified tolerance. This utility is only non-zero if the posterior mode is "close" to γ l for all l = 1, . . . , p. Setting δ l = 0 is typically only appropriate for discrete γ l .

Approximating utility functions
Most utility functions, including those in Section 3.1, require approximation of posterior quantities, for example the marginal likelihood, π(y|d), or posterior mean, E (γ|y, d). Such quantities are analytically intractable for most models. Here we review those methods implemented in acebayes to produce approximate utilitiesũ(γ, y, d).
Overstall and Woods (2017) used Monte Carlo approximations, with a sample {γ b }B b=1 from π(γ|d), to approximate u SIG and u N SEL (in Equations 6 and 7, respectively) to enable design selection for parameter estimation, i.e., with γ = θ, for a single model. When combined with a Monte Carlo approximation to the expected utility with sample size B, the resulting nested Monte Carlo (or double-loop Monte Carlo) approximation to U (d) is subject to bias of order B −1 (see Ryan 2003). Hence, large values of both B andB are required to achieve suitable precision for design comparison and neglible bias, resulting in computationally expensive utility approximations.
Although the use of the one-dimensional emulatorsÛ ij (x) helps to alleviate the computational cost associated with a nested Monte Carlo approximation, the adoption of alternative, cheaper, utility approximations can further increase the range and size of design problems that can be addressed. Several classes of analytical approximations have been proposed using normal approximations to the posterior distribution. Overstall et al. (2018a) applied ACE with a normal approximation to the posterior distribution with mean equal to the posterior mode and variance-covariance matrix equal to the inverse of the expected Fisher information, I(θ; d), minus the second derivative of the log prior density, both evaluated at the posterior mode. Such an approximation can lead to analytically tractable, if still potentially biased, normal-based approximationsũ(γ, y, d); for example, via a Laplace approximation to the marginal likelihood (see also Long et al. 2013).
Simpler approximations to some utilities can be obtained by using I(θ; d) −1 as an approximation to the posterior variance-covariance matrix (e.g., Chaloner and Verdinelli 1995). For example, for estimation of γ = θ, approximations to the SIG and NSEL utility functions are given byũ u N SELA (θ, y, d) = −tr I(θ; d) −1 .
Designs that maximise the expectation ofũ SIGD andũ N SELA with respect to the prior distribution of θ are referred to as pseudo-Bayesian D-and A-optimal, respectively.

Approximating the expected utility
The acebayes package uses expected utility approximations of the form given in Equation 4. For the nested Monte Carlo and normal-based approximations, U (d) is approximated by the sample mean ofũ(γ b , y b , d) for a sample {γ b , y b } B b=1 from π(γ, y|d). Default values of B (andB for nested Monte Carlo) are B = 1000 for constructing the one dimensional emulatorŝ U ij (x) and B = 20, 000 when calculating the probability of accepting the proposed design (see Section 2.1).
For pseudo-Bayesian D-and A-optimal design, where the approximations given by Equations 9 and 10 do not depend on y, the p-dimensional integrals with respect to γ = θ can be approximated using quadrature methods. The acebayes package implements a radial-spherical integration rule (Monahan and Genz 1997), with γ b and ω b in Equation 4 being abscissas and (non-constant) weights, respectively. For small p, the value of B is typically of the order of several hundred, making this approach much less computationally intensive than either nested Monte Carlo or normal-based methods. Both multivariate normal and independent uniform prior densities are implemented for use with quadrature approximations in acebayes. See Gotwalt et al. (2009) for more details on using this quadrature scheme to find pseudo-Bayesian D-optimal designs.

Examples
In this section, we demonstrate the use of the acebayes package to find Bayesian optimal designs for four examples. Despite ACE being able to find efficient designs for larger and more complex problems than existing methods in the literature, it still requires significant computational resources. Hence we have chosen examples that illustrate the main features of the methodology and package but that do not require excessive computer time to complete. It should be clear how the examples can be extended to address more complex or realistic scenarios. In particular, the main arguments to the functions ace and pace are essentially identical. Therefore, to minimise the computer time taken to reproduce the examples, we only demonstrate the pace functionality in Sections 4.1 and 4.2 (in addition to that already demonstrated in Section 2.2).
A particular feature of this section is demonstration of the functions aceglm and acenlm, which simplify the process of finding Bayesian optimal designs for generalised linear models and nonlinear models, respectively. The ace function allows designs to be sought for very general problems. This flexibility comes at the price that non-expert users may feel uncomfortable with the level of additional coding required to use the function. To remove this potential barrier to the use of the package, aceglm and acenlm provide wrappers to ace that implement the ACE algorithm for these common model types. In both cases, the functions allow designs to be found for parameter estimation under a single model for a range of (approximated) utility functions. There are also corresponding functions, paceglm and pacenlm, which implement repetitions of ACE for generalised linear models and nonlinear models, respectively. Both (p)aceglm and (p)acenlm make use of the familiar formula and family R arguments and objects. In Sections 4.1 and 4.2 we demonstrate, in detail, the use of these functions.
In each case, unless otherwise stated, we use a randomly generated Latin hypercube spacefilling design (see Santner et al. 2003, ch. 5) as the starting design for the ACE algorithm. These designs are generated using the randomLHS function in the lhs package (Carnell 2016).
Each element of the starting design is scaled to lie in the stated design space. Additionally, before we generate such a design, we set a random seed for full reproducibility of the results in this section.

Compartmental non-linear model
In this example, we demonstrate using acenlm to generate a pseudo-Bayesian D-optimal design for a compartmental model commonly used in pharmacokinetics (PK). The acenlm and pacenlm functions can find optimal designs for models of the form where y 1 , . . . , y n are assumed independently distributed and the user specifies a non-linear function, µ(θ; x i ), of parameters θ, and prior distributions for both θ and σ 2 > 0, the unknown response variance.
A PK experiment typically involves introducing a fixed amount of drug to the body at time zero and measuring at times t 1 , . . . , t n the amount of drug remaining in the body. Hence the design consists of the n sampling times, i.e., x i = (t i ), k = 1 and d = (t 1 , . . . , t n ) . Here we assume a sampling interval such that t i ∈ [0, 24] hours. We illustrate (p)acenlm on the compartmental model with p = 3 unknown parameters θ = (θ 1 , θ 2 , θ 3 ) .
For non-linear models of the form of Equation 11, the Fisher information for θ is Following Gotwalt et al. (2009), we find designs with n = 18 sampling times and assume that elements of θ have the following independent prior distributions: with θ 3 having a prior point mass at 21.8.
To find a pseudo-Bayesian D-optimal design using acenlm we first specify one starting design having a single column named "t" and with elements scaled to the sampling interval [0, 24].
R> set.seed(1) R> n <-18 R> k <-1 R> p <-3 R> start.d <-randomLHS(n = n, k = k) * 24 R> colnames(start.d) <-c("t") We use quadrature to approximate the expected utility, and hence define the prior below as a list containing a single matrix support, where the rows specify the lower and upper limits of the uniform prior distribution for each parameter, respectively; see Equation 12. For use with the acenlm function, the columns of this matrix should be named. As I(θ; d) only depends linearly on 1/σ 2 , the relative performance of designs under pseudo-Bayesian criteria do not depend on the unknown σ 2 . Hence it is not required to specify a prior distribution for σ 2 .
R> ex411 <-acenlm(formula =~theta3 * (exp( -theta1 * t) -+ exp( -theta2 * t)), start.d = start.d, prior = prior, lower = 0, + upper = 24) For demonstration purposes, we compare the designs found from Phase I and II using the S3 method assess (as introduced in Section 2.2). In cases of a deterministic approximation to the expected utility, assess will calculate one approximation to the expected utility for each of d1 and d2. Furthermore, when d1 is as a result of a call to acenlm or aceglm with the criterion argument being "D" or "A", then assess will also approximate, again using quadrature, the pseudo-Bayesian relative D-or A-efficiency. The pseudo-Bayesian relative D-efficiency of design d 1 relative to design d 2 , defined as provides a quantitative comparison of two designs. Here, Similar relative efficiency exists for pseudo-Bayesian A-optimal designs. The function assess calculates these efficiencies for pseudo-Bayesian D and A-optimal designs.
R> assess(d1 = ex411, d2 = ex411$phase1.d) Approximate expected utility of d1 = 15.79695 Approximate expected utility of d2 = 15.70753 Approximate relative D-efficiency = 103.0255% Here we see that Phase II led to a small increase in the expected utility. We can also compare the two designs in terms of the number of unique sampling times.
It is quite common in PK, and similar, experiments for there to be constraints on the minimum time between successive measurements. For such experiments, the ordered sampling times, t (1) , . . . , t (n) , should satisfy the following constraint: See Ryan et al. (2014) and Overstall and Woods (2017) for examples with similar constraints on the design for the compartmental model.
We can include such constraints in Phase I of the ACE algorithm using the limits argument. In Phase I, the candidate design is chosen by replacing the current coordinate (i.e., x ij , i = 1, . . . , n; j = 1, . . . , k) by the value that maximises the predictive mean of the Gaussian process emulator. As discussed in Section 2.1, this maximisation is achieved by choosing the point with largest predicted mean from a grid of points, typically on an interval defined by the arguments lower and upper. We can also specify the argument limits as a function to incorporate (multivariate and dynamic) constraints on the design coordinates. The function should have three arguments: d, i and j. Here d specifies the design, and i and j define the current coordinate by specifying the row and column of d. The function should return a grid of points (in the form of a vector) from which the point with the largest predicted mean will be chosen.
The code below defines a limits function to incorporate the constraint given by Equation 13, with c = 0.25 (i.e., 15 minute intervals between sampling). A grid of 10,000 points from lower = 0 to upper = 24 is created. We then remove all points from this grid that are within 15 minutes of the sampling times in the current design excluding the ith point. Note that here the function does not depend on j as the design involves only k = 1 factor. We find a design satisfying the constraint by including the specification of the limits argument in the acenlm function and setting the number of Phase II iterations to N 2 = 0 (as we do not want to consolidate clusters into repeated sampling times; see the constraint given by Equation 13).
The only exception is that the argument start.d should be a list where each element is an n × k matrix. Below we specify such a list with C = 10.
R> assess(d1 = ex412a, d2 = ex412b) Approximate expected utility of d1 = 15.34813 Approximate expected utility of d2 = 15.36236 Approximate relative D-efficiency = 99.52679% The design found under one repetition (ex412a$phase2.d) is about 99.5% D-efficient compared to the design found under C = 10 repetitions. In this case, the procedure was relatively robust to the starting design.

Logistic regression
In this section we consider a logistic regression model from Overstall and Woods (2017) to demonstrate the use of the aceglm and paceglm functions. We find designs that maximise the expected NSEL utility for estimation of parameters γ = θ using two different approximations to the utility function: 1) approximation given by Equation 10, resulting in a pseudo-Bayesian A-optimal design; and 2) the normal-based approximation of Overstall et al. (2018a).
Independent uniform prior distributions for each element of θ are assumed, The aceglm function has four mandatory arguments. The arguments formula and family are the well-known arguments we would supply to the glm function in the stats package (R Core Team 2018) and are used to define the logistic regression model, i.e., formula =~x1 + x2 + x3 + x4 and family = binomial. The argument start.d specifies the starting design, an n × k matrix with columns named as per the terms in the formula argument; and prior specifies the prior distribution for θ, with the input for this argument depending on the chosen method (see below). Initially, we specify values for n, p and k, and generate a list of C = 10 starting designs.

Pseudo-Bayesian A-optimal design
We start by finding a pseudo-Bayesian A-optimal design by specifying criterion = "A". Note that the default value is criterion = "D" which would result in a pseudo-Bayesian D-optimal design using the approximation given in Equation 9. To approximate the expected utility we use the radial-spherical quadrature rule discussed in Section 3.3. This can be specified by setting the method argument in paceglm to "quadrature", which is the default for pseudo-Bayesian criteria (i.e., criterion being "A", "D" or "E"). Under this method, to specify the uniform prior distribution given by Equation 14, we define a list with a single matrix argument support with the first row specifying the lower limits for the prior for each parameter, and the second row specifying the upper limits. The other optional arguments to paceglm are shared with ace (see Table 2). The exception is the argument deterministic, which is not required for (p)aceglm and (p)acenlm as the method argument specifies if the approximation to the expected utility is deterministic. We leave these optional arguments set to default values but demonstrate their use in later examples.

Normal-based approximation to the NSEL utility function
To apply the normal-based approximation to the NSEL utility function we let criterion = "NSEL-Norm" in the paceglm function. This changes the default method argument to "MC" implementing Monte Carlo. Under this method, we require an R function that generates a sample from the prior distribution given by Equation 14. R> ex412 <-paceglm(formula =~x1 + x2 + x3 + x4, family = binomial, + start.d = start.d, prior = prior, criterion = "NSEL-Norm") We can check the approximate convergence of the ACE algorithm using the S3 method plot.ace.

R> plot(ex412)
This function produces a trace plot (see Figure 2) of the Monte Carlo approximation to the expected utility (1) against iteration number for Phases I and II and the design that had the highest expected utility from the C repetitions. In Phase I, the algorithm makes very large initial improvements to the approximate expected utility, and appears to have converged after six or seven iterations. Phase II does not appear to lead to any improvements in the design, as also occurred when finding the pseudo-Bayesian A-optimal design.
We now compare the pseudo-Bayesian A-optimal design (ex411$d) to the design found under the normal-based approximation to the NSEL utility (ex412$d). Notice how the optimal design under the normal-based approximation (ex412$d) achieves a substantially larger expected NSEL utility than the pseudo-Bayesian A-optimal design (ex411$d). Overstall and Woods (2017) and Overstall et al. (2018a) empirically investigated how the difference between NSEL and A-optimal designs decreases as n, the number of runs, increases, as the asymptotic approximation underpinning pseudo-Bayesian optimal designs improves. Overstall et al. (2018a) also found that, for this example, the difference between designs found using ACE with nested Monte Carlo and normal-based approximations to the expected NSEL utility were negligible, regardless of the value of n. However, designs under the normal-based approximation typically took around one-third of the computational time to find.

Model selection for chemical reactions
In this example, we demonstrate using the ace function for a problem that falls outside the capabilities of aceglm and acenlm, and illustrate construction of a bespoke utility function.
This example is adapted from Box and Hill (1967) and concerns mechanistic modelling of chemical reactions. We find designs with n runs where the ith run requires specifying the reaction time, x i1 ∈ (0, 150), and temperature, x i2 ∈ (450, 600), at which to measure reaction yield, y i (i = 1, . . . , n). Therefore each run of the design is a two-vector x i = (x i1 , x i2 ) . The following statistical model is posited: with unknown parameters θ = (θ 1 , θ 2 ) and m ∈ M = {0, 1, 2, 3} specifying the order of the reaction, with m = 0, 1, 2, 3 corresponding to first-, second-, third-and fourth-order reactions, respectively.
We create an R function to implement η(θ; x i ) with arguments d, an n×k matrix, and theta, a B × p matrix with bth row given by θ b = (θ 1b , θ 2b ) . It returns a B × n matrix with bith element given by θ 1b x i1 exp (−θ 2b x i2 ). That is, it calculates the value of the function η for a given design d for every row of a matrix of parameter values theta. The aim of the experiment is to determine which order reaction is appropriate for the observed responses, i.e., a choice from the set M = {0, 1, 2, 3}. That is, model selection with γ = (m) in u(γ, y, d). Following Overstall et al. (2018a), identical prior distributions are assumed for the parameters under each model: We fix the response standard deviation as σ = 0.1, double the value assumed by Box and Hill (1967). We choose this larger value as use of too small a value of σ leads to the expected utility becoming less dependent on the design, i.e., the expected utility surface is quite flat. Equal prior probabilities are assumed for each model, i.e., π(m) = 1/4, for all m ∈ M.
The expected 0-1 utility can then be approximated as for a sample {m b , θ b , y b } B b=1 from the joint distribution of m, θ and y; such a sample can be easily generated by sampling a model indicator from M with probabilities π(m), parameters from the prior distribution with density π(θ), and then responses from the conditional distribution with density π(y|m, θ) (see the code below).
We can now create an R function, util01, to implement this approximate utility. Note that we setB = 100. The function takes as arguments a design d and Monte Carlo sample size B.
The main work is done within the nested for loop; for each data set generated in the outer loop, the posterior modal model is found by maximising a Monte Carlo approximation to the marginal likelihood, which is calculated in the inner loop. It returns a vector of B evaluations of the Monte Carlo approximated utility functionũ(m, y, d). We use the ace function along with util01 to find an optimal design. For illustration, we search for a design with n = 20 runs and set the arguments B = c(1000, 100), Q = 15, N2 = 0 (to skip Phase II), and using the arguments lower and upper to set the bounds for each factor (see Table 2). We specify a Bayesian hypothesis test of a difference in proportions by setting binary = TRUE.
We compare the approximations to the expected 0-1 utility for the starting design and the final design from ACE as follows.
R> assess43 <-assess(d1 = ex43, d2 = start.d) R> assess43 Mean (sd) approximate expected utility of d1 = 0.8789 (0.00611211) Mean (sd) approximate expected utility of d2 = 0.80565 (0.0120973) Assuming the data generating process is consistent with one of the models, the starting design identifies the true model about 81% of the time. By using an optimal design we can increase this to about 88%.

Optimal design for prediction
Prediction from a nonparametric regression model is a common aim of both spatial studies and computer experiments, often using Gaussian process (GP) regression (or Kriging). For example, optimal sensor placements, e.g., for pollution or environmental monitoring, may be sought within a geographical region of interest to provide accurate predictions at unsampled locations. See Diggle and Lophaven (2006), Zimmerman (2006) and Uciński and Patan (2007). As the cost of maintaining large monitoring networks can be high, costs are often also associated with the placement of each sensor. We demonstrate using acebayes to find an optimal design for such a problem.
Here, a design consists of n locations x i = (x i1 , x i2 ) , within a specified two-dimensional region, at each of which a response y i will be observed. The aim is to fit a Gaussian process model to the resulting responses, y = (y 1 , . . . , y n ) , and to predict the n 0 unobserved responses, y 0 = (y 01 , . . . , y 0n 0 ) , where y 0i is associated with pre-specified location x 0i = (x 0i1 , x 0i2 ) , for i = 1, . . . , n 0 . Letỹ = y , y 0 denote theñ × 1 vector of observed and unobserved responses whereñ = n + n 0 .
The design problem is to specify the n × 2 matrix d (with ith row x i ) to best predict y 0 where the meaning of "best" is controlled by the choice of utility function (see later).
A zero-mean Gaussian process model results in the assumption of multivariate normal distribution forỹ,ỹ where σ 2 > 0 is a scale parameter andΣ =C + τ 2 Iñ.
R> k <-2 R> r <-10 R> n0 <-r^k R> x0 <-seq(from = 0, to = 1, length.out = r) R> d0 <-as.matrix(expand.grid(x0, x0)) We find optimal designs for prediction using a utility function adapted from Sansó and Müller (1997) and Müller et al. (2004) which compromises between the accuracy of the posterior predictive mean, E(y 0 |y), at the n 0 new locations x 01 , . . . , x 0n 0 against the cost of the n placed sensors: where δ > 0 controls the desired accuracy and c(x i ) is the cost of taking an observation at x i . In this example, the cost c(x i ) depends on the location of the proposed sensor and is given by c(x i ) = x 2 i1 + x 2 i2 , i.e., the squared Euclidean distance from the origin. The total cost of the design is given by n i=1 c(x i ). Under the model given by Equation 17, the posterior predictive mean of y 0 is given by E(y 0 |y) = S C + τ 2 I n −1 y , with ith element E(y 0i |y), for i = 1, . . . , n 0 .
R> sum(apply(start.d^2, 1, sum)) [1] 6.653732 R> sum(apply(ex44$phase2.d^2, 1, sum)) [1] 3.081133 The difference in utility values between the Bayesian optimal design and the starting design are mostly due to the much lower cost of the sensor placements in the optimal design. That is, the optimal design has similar predictive accuracy as the starting design but at substantially reduced cost. Figure 3 shows the very different distribution of the points in the optimal design compared to the starting design, with the former having many more points in areas of low cost near the origin.

Discussion
Bayesian optimal design is conceptually straightforward but often difficult, and computationally expensive, to implement. The acebayes package provides a suite of functions that allow optimal designs to be found for complex and realistic problems, with dimensionality at least one order of magnitude greater than other current methods. The general purpose ace and pace functions can be used to solve very general, and bespoke, design problems. The functions (p)aceglm and (p)acenlm can find designs for common classes of statistical models. Any set of examples can only be illustrative, and those in this paper are no different. To aid exposition, we have deliberately kept the problems relatively simple and the designs sought have been small. However, acebayes has been used to find designs for more complex scenarios. These include high-dimensional design spaces  and ordinary differential equation models (Overstall et al. 2018d). It has also been used (Overstall and McGree 2018) to find design for intractable likelihood models (e.g.  and for synthesis optimisation of pharmaceutical products (Overstall et al. 2018b).
While acebayes allows much larger designs to be found than existing methods, for complex problems we would recommend coding the utility function in a low-level programming language (e.g., C/C++) and running the code on a computational cluster. The algorithm is heuristic, and so to overcome convergence to local optima, it should be run multiple times from different starting designs, for example using parallel computing and/or the pace function.
We have created a YouTube channel (https://bit.ly/2SSC3ur) hosting tutorial videos and a Twitter account (@acebayes) to update users on the latest developments to the acebayes package.
Acknowledgments D.C. Woods was partially supported by a Fellowship EP/J018317/1 from the UK Engineering and Physical Sciences Research Council. The authors are grateful for constructive comments from two anonymous reviewers that improved the paper.

A. The approximate coordinate exchange algorithm
This appendix provides details on Phase I (Appendix A.1) and Phase II (Appendix A.2) of the ACE algorithm.
A.1. Phase I 1. Choose an initial design d 0 ∈ D and set the current design to be d C = d 0 .
(a) Let d C (x q ij ) equal d C with ijth coordinate (entry) replaced by x q ij , where x 1 ij , . . . , x Q ij are the points from a one-dimensional space filling design in D ij , the design space for the ijth element of d.
(c) Find x ij = arg max x∈D ijÛ ij (x) , and set d = d C (x ij ).
(d) For a stochastic (e.g., Monte Carlo) approximationŨ , set d C = d with probability p * derived from a Bayesian hypothesis test. For a deterministic approximation (e.g., quadrature), set d C = d ifŨ (d ) >Ũ (d C ).