A design criterion for symmetric model discrimination based on flexible nominal sets

Abstract Experimental design applications for discriminating between models have been hampered by the assumption to know beforehand which model is the true one, which is counter to the very aim of the experiment. Previous approaches to alleviate this requirement were either symmetrizations of asymmetric techniques, or Bayesian, minimax, and sequential approaches. Here we present a genuinely symmetric criterion based on a linearized distance between mean‐value surfaces and the newly introduced tool of flexible nominal sets. We demonstrate the computational efficiency of the approach using the proposed criterion and provide a Monte‐Carlo evaluation of its discrimination performance on the basis of the likelihood ratio. An application for a pair of competing models in enzyme kinetics is given.

When the models are nested and (partly) linear, -optimality can be shown to be equivalent to -optimality for a single parameter that embodies the deviations from the smaller model (see, e.g., Dette & Titoff, 2009;Stigler, 1971). For this setting the optimal design questions are essentially solved and everything hinges on the asymmetric nature of the NP-lemma with respect to the null-and alternative hypotheses. However, for a nonnested case the design problem itself is often inherently symmetric with respect to the exchangeability of the compared models and it is the purpose of the experiment to decide which of those two different models is true.
The aim of this paper is to solve the discrimination design problem in a symmetric way focusing on nonnested models. Thus, standard methods that are inherently asymmetric like -optimality, albeit being feasible, are not a natural choice. We further suppose that we do not use the full prior distribution of the unknown parameters of the models, which rules out Bayesian approaches such as Felsenstein (1992) and Tommasi and López-Fidalgo (2010). Nevertheless, as we will make more precise in the next section, we will utilize what can be perceived as a specific kind of prior knowledge about the unknown parameters, extending the approach of local optimality. Our goal is to provide a lean, computationally efficient and scalable method as opposed to the heavy machinery recently employed in the computational statistics literature, for example, Hainy, Price, Restif, and Drovandi (2018). Furthermore, we strive for practical simplicity, which at first prohibits sequential (see Buzzi-Ferraris & Forzatti, 1983;Müller & Ponce De Leon, 1996;Schwaab et al., 2006) or sequentially generated (see Vajjah & Duffull, 2012) designs.
A standard solution to the symmetric discrimination design problem is to employ symmetrizations of asymmetric criteria such as compound -optimality, which usually depend on some weighting chosen by the experimenter. Also the minimax strategy recently presented in Tommasi, Martín-Martín, and López-Fidalgo (2016) is essentially a symmetrization. Moreover, the usual minimax approaches lead to designs that completely depend on the possibly unrealistic extreme values of the parameter space and their calculation again demands enormous computational effort.
As the closest in spirit to our approach could be considered a proposal for linear models in section 4.4 of Atkinson and Fedorov (1975) and its extension in Fedorov and Khabarov (1986) which, however, was not taken up by the literature. The probable reason is that it involves some rather arbitrary restrictions on the parameters as well as taking an artificial lower bound to convert it into a computationally feasible optimization problem.
For expositional purposes we will now restrict ourselves to a rather specific design task but will discuss possible extensions at the end of the paper.
Let ≠ ∅ be a finite design space and let  be a design on , that is, a vector of design points 1 , … , ∈ , where is the chosen size of the experiment. Hence, in the terminology of the theory of optimal experimental design, we will work with exact designs. We will consider discrimination between a pair of nonlinear regression models = 0 ( 0 , ) + , = 1, … , , and where 1 , … , are observations, 0 ∶ Θ 0 × → ℝ, 1 ∶ Θ 1 × → ℝ are the mean-value functions, Θ 0 ⊆ ℝ 0 , Θ 1 ⊆ ℝ 1 are parameter spaces with nonempty interiors int(Θ 0 ), int(Θ 1 ), and 1 , … , are unobservable random errors. For both = 0, 1 and any ∈ , we will assume that the functions (⋅, ) are differentiable on int(Θ ); the gradient of (⋅, ) in ∈ int(Θ ) will be denoted by ∇ ( , ). Our principal assumption is that one of the models is true but we do not know which, that is, for = 0 or for = 1 there exists̄∈ Θ such that = (̄, ) + .
Let the random errors be i.i.d. (0, 2 ), where 2 ∈ (0, ∞). The assumption of the same variances of the errors for both models is plausible if, for instance, the errors are due to the measurement device and hence do not significantly depend on the value being measured. The situation with different error variances requires a more elaborate approach, compared with Fedorov and Pázman (1968).
Eventually we are aiming not just at achieving some high design efficiencies with respect to our newly proposed criterion, but also want to test its usefulness in concrete discrimination experiments, that is, the probability that using our design we arrive at the correct decision about which model is the true one. So, to justify our approach numerically, we require a model-discrimination rule that will be used after all observations based on the design  are collected for evaluational purposes.
The choice of the best discrimination rule based on the observations is generally a nontrivial problem. However, it is natural to compute the maximum likelihood estimateŝ0 and̂1 of the parameters under the assumption of the first and the second model, respectively, and then base the decision on whether that is, the likelihood ratio being smaller or greater than 1, or perhaps more simply whether log (̂0) − log (̂1) <> 0. Under the normality, homoskedasticity, and independence assumptions, this decision is equivalent to a decision based on the proximity of the vector ( ) =1 of observations to the vectors of estimated mean values ( 0 (̂0, )) =1 and ( 1 (̂1, )) =1 . For the case 0 ≠ 1 to counterbalance favoring models with greater number of parameters, Cox (2013) recommends instead the use of (̂0)∕ (̂1)( 1 ∕ 0 ) ∕̃, which corresponds to the Bayesian information criterion; see Schwarz (1978). Herec orresponds to the number of observations in a real or fictitious prior experiment. For the sake of simplicity however, we will now restrict ourselves to the case of ∶= 0 = 1 . Note that for the evaluational purposes we are taking a purely model selection based standpoint. More sophisticated testing procedures for instance allowing both models to be rejected based on the pioneering work of Cox (1961) are reviewed and outlined in Pesaran and Weeks (2007). Let 1 , … , ∈ and let  = ( 1 , … , ) be the design used for the collection of data prior to the decision, and assume that model 0 is true, with the corresponding parameter valuē0. Note that this comes without loss of generality and symmetry as we can equivalently assume model 1 to be true. Then, the probability of the correct decision based on the likelihood ratio is equal to [ where ( ) =1 follows the normal distribution with mean ( 0 (̄0, )) =1 and covariance 2 . Clearly, probability (2) depends on the true model, the unknown true parameter, and also on the unknown variance of errors. Even if these parameters were known, the probability of the correct classification would be very difficult to compute for a given design because this requires a combination of high-dimensional integration and multivariate nonconvex optimization. Therefore, it is practically impossible to directly optimize the design based on formula (2). However, we can simplify the problem by constructing a lower bound on (2) which does not depend on unknown parameters and is relatively much simpler to maximize with respect to the choice of the design. The bound based on the distance ( 0 , 1 ), where is the set of all possible mean values of the observations under the model , = 0, 1, and (., .) denotes the infimum distance between all pairs of elements of two sets, is developed as follows.
To make (2) as high as possible, it makes sense to maximize (3), that is, maximize ( 0 , 1 ), which depends on the underlying experimental design. Although this maximization is much simpler than maximizing (2) directly, it still generally requires nonconvex multidimensional optimization at each iteration of the maximization procedure, which is impractical for computing exact optimal designs. A realistic approach must be numerically feasible and address the problem of the dependence of the design on unknown true model parameters, which we will achieve by rapidly computable approximation of ( 0 , 1 ) through linearization, as will be explained in the following section.

Example 1: A motivating example
Let 0 ( 0 , ) = 0 and 1 ( 1 , ) = 1 . Furthermore, for the moment we assume just two observations 1 , 2 at fixed design points 1 = −1 and 2 = 1, respectively. In this case evidentlŷ0 = 2 − 1 2 and̂1 is the solution of 2 − ( 1 − − ) − 2 ( 2 − ) = 0, which for −2 ≤ 1 ≤ 2 is the log root of the polynomial 4 − 3 2 + 1 − 1. Figure 1 displays the log-likelihood-ratio contours for the original and linearized models and it is obvious that the former are nonconvex and complex while the latter are much simpler, convex, and do approximate fairly well for a wide range of responses. Note that while this example is for a fixed design it motivates why the linearizations can serve as the cornerstones of our design method as will become clearer in the following sections.

THE LINEARIZED DISTANCE CRITERION
We suggest an extension of the idea of local optimality used for nonlinear experimental design. Let̃0 ∈ int(Θ 0 ) and̃1 ∈ int(Θ 1 ) be nominal parameter values, which satisfy the basic discriminability condition 0 (̃0, ) ≠ 1 (̃1, ) for some ∈ . Let us introduce regionsΘ 0 ⊆ int(Θ 0 ) ⊆ ℝ andΘ 1 ⊆ int(Θ 1 ) ⊆ ℝ containing̃0 and̃1; we will consequently callΘ 0 andΘ 1 flexible nominal sets. It is evident that optimal designs depend on the parameter spaces in the same way as on our flexible nominal sets (cf. Dette, Melas, & Shpilev, 2013), but the latter will not be considered fixed like the parameter spaces Θ 0 and Θ 1 . A novelty of our procedure is that we use these sets as a tuning device.
Let  = ( 1 , … , ) be a design. Let us perform the following particular linearization of Model =0,1 iñ: where () is the × matrix given by is a vector of independent (0, 2 ) errors. Note that for the proposed method the vector () plays an important role and, although it is known, we cannot subtract it from the vector of observations, as is usual when we linearize a single nonlinear regression model. However, if corresponds to the standard linear model then () = for any . F I G U R E 2 Illustrative graph for the definition of () for two one-parameter models (Θ 0 , Θ 1 ⊆ ℝ) and a design of size two ( = ( 1 , 2 )). The line segments correspond to the sets { 0 () + 0 () 0 ∶ 0 ∈Θ 0 } and { 1 () + 1 () 1 ∶ 1 ∈Θ 1 } for some flexible nominal setsΘ 0 andΘ 1

Definition of the criterion
Consider the design criterion for 0 ∈Θ 0 , 1 ∈Θ 1 . The criterion can be viewed as an approximation of the nearest distance of the mean-value surfaces of the models, in the neighborhoods of the vectors ( 0 (̃0, )) =1 and ( 1 (̃1, )) =1 ; see the illustrative Figure 2.
We will now express the -criterion as a function of the design  = ( 1 , … , ) represented by the counting measure on defined as where # means the size of a set. Let̃= (̃0 ,̃1 ) . For all ∈ let ) .

Computation of the criterion value for a fixed design
For a fixed design , expression (5) shows that 2 (| ) is a quadratic function of = ( 0 , 1 ) . Moreover, both (| ) and 2 (| ) are convex because they are compositions of an affine function of and convex functions ‖.‖ and ‖.‖ 2 , respectively. Clearly, if the flexible nominal sets are compact, convex, and polyhedral, optimization (4) can be efficiently performed by specialized solvers for linearly constrained quadratic programming.
Alternatively, we can view the computation of (| ) as follows. As the minimization in (4) is equivalent to computing the minimum sum of squares for a least squares estimate of restricted tõ Θ ∶=Θ 0 ×Θ 1 in the response difference model with artificial observations Thus, ifΘ 0 =Θ 1 = ℝ , the infimum in (4) is attained, and it can be computed using the standard formulas of linear regression in the response difference model. If the flexible nominal sets are compact cuboids, (4) can be evaluated by the very rapid and stable method for bounded variable least squares implemented in the R package bvls; see Stark and Parker (1995) and Mullen (2013).
The following simple proposition collects the analytic properties of a natural analogue of defined on the linear vector space Ξ of all finite signed measures on .
Positive homogeneity of 2 implies that an -fold replication of an exact design leads to an -fold increase of its 2 value. Consequently, a natural and statistically interpretable definition of relative -efficiency of two designs  1 and  2 is given by 2 ( 1 )∕ 2 ( 2 ), provided that 2 ( 2 ) > 0.
Let be the set of all -point designs. A design  * ∈ will be called -optimal, if  * ∈ argmax ∈ ().
As the evaluation of the -criterion is generally very rapid, the calculation of a -optimal, or nearly -optimal design is similar to that for standard design criteria. For instance, in small problems we can use complete-enumeration and in larger problems we can employ an exchange heuristic, such as the KL exchange algorithm (see, e.g., Atkinson, Donev, & Tobias, 2007).

Parametrization of flexible nominal sets
For simplicity, we will focus on cuboid flexible nominal sets centered at the nominal parameter values. This choice can be justified by the results of Sidak (1967), in particular if we already have confidence intervals for individual parameters; see further discussion in Section 4. Specifically, we will employ the homogeneous dilations ∶= ℝ , such that can be considered a tuning (set) parameter governing the size of the flexible nominal sets.
In (12),Θ (1) 0 andΘ (1) 1 are "unit" nondegenerate compact cuboids centered on respective nominal parameters. For any design  and ∈ [0, ∞], we define Note that for our choice of flexible nominal sets the infimum in (13) is attained. The -optimal values of the problem will be denoted by  (12) and (13), and inequality 2 ( 1 ) ≥ 2 ( 2 ) follows from the fact that a maximum of nonincreasing functions is a nonincreasing function. Monotonicity of () and ( ) in can be shown analogously.
To prove the convexity of 2 () in , let ∈ (0, 1) and let = which proves that 2 () is convex in . The convexity of () in can be shown analogously. The functions 2 and , as pointwise maxima of a system of convex functions, are also convex.
(b) For any design  of size , the function 2 ∞ (|⋅) is nonnegative and quadratic on ℝ 2 , therefore its minimum is attained in some  ∈ ℝ 2 . There is only a finite number of exact designs of size , andΘ ( ) ↑ ℝ 2 , which means that there exists * < ∞ such that  ∈Θ ( * ) for all designs  of size . Let ≥ * . We have proving (i). Let  (∞) be any ∞ -optimal -trial design. The equality (i) and the fact that ( (∞) ) and ( ) are nonincreasing with respect to gives ( (∞) ) ≥ ∞ ( (∞) ) = (∞) = ( * ) ≥ ( ), which proves (ii). □ The second part of Proposition 2.2 implies the existence of a finite interval [0, * ] of relevant set parameters; increasing the set parameter beyond * leaves the optimal designs as well as the optimal value of the -criterion unchanged. We will call any such * a set upper bound.
Algorithm 1 provides a simple iterative method of computing * . Our experience shows that it usually requires only a small number of recomputations of the -optimal design, even if is small and is close to 1, resulting in a good set upper bound * (see the metacode of Algorithm 1 for details). Due to the high speed and stability of the computation of the values of for candidate designs, it is possible to use an adaptation of the standard KL exchange heuristic to compute the input value (∞), as well as to obtain -optimal designs in steps 2 and 9 of the algorithm itself.

F I G U R E 3
-Optimal designs of size = 6 for different s; see the second part of the motivating example. The horizontal axis corresponds to the design space, and the vertical axis corresponds to different spans of the flexible nominal sets. For each , the figure displays the number of repeated observations at different design points, corresponding to the -optimal design For some pairs of competing models there exists a set upper bound * , beyond which the values of are constantly 0 for all designs. These cases can be identified by solving a linear programming (LP) problem, as we show next.
Proof. From the expression (7) we see that for any design  and its nonreplication version  we have ( ) = 0 implies () = 0. Moreover, if  2 ⪰  1 in the sense that  2 is an augmentation of  1 then ( 2 ) = 0 implies ( 1 ) = 0. Now let ( * , , ) be a solution of (14), let ≥ * and let  be any design. Definition of and the form of (14) imply () = 0. From ⪰  we see that then ( ) = 0, hence () = 0. The proposition follows. □ Note that * obtained using Proposition 2.3 does not depend on , that is, it is a set upper bound simultaneously valid for all design sizes. The basic discriminability condition implies that * ≠ 0.
If the competing models are linear, vectors 0 () and 1 () are zero. Therefore, (2.3) has a feasible solution ( , , ) for any ≥ 0 such that bothΘ ( ) 0 andΘ ( ) 1 cover . That is, for the case of linear models, there is a finite set upper bound * beyond which the -values of all designs vanish. However, the same holds for specific nonlinear models, including the ones from Section 3.

Proposition 2.4. Assume that both competing regression models are linear provided that we consider a proper subset of their parameters as known constants. Then (2.3) has a finite feasible solution, that is, there exists a finite set upper bound
Proof. Without loss of generality, assume that fixing the first 0 < components of 0 converts Model 0 to a linear model. More precisely, let 01 , … , 0 denote the components of 0 and assume that for some functions (0) , = 0 + 1, … , . Choosê0 such that̂0 =̃0 for = 1, … , 0 , and̂0 = 0 for = 0 + 1, … , .
Make an analogous assumption for Model 1 and also definê1 analogously. It is then straightforward to verify that for the design  from Proposition 2.3 we have ()̂+ () = , where = # , for both = 0, 1. Therefore, any ( ,̂0 ,̂1 ) such that 0 ∈Θ ( ) 0 and̂1 ∈Θ ( ) 1 is a solution of (14) in Proposition 2.3. □ In the following, we numerically demonstrate that the design criterion leads to designs which yield a high probability of correct discrimination.

AN APPLICATION IN ENZYME KINETICS
This real applied example is taken from Bogacka, Patan, Johnson, Youdim, and Atkinson (2011) and was already used in Atkinson (2012) to illustrate model-discrimination designs. There two types of enzyme kinetic reactions are considered, where the reactions velocity is alternatively modeled as  (15) and (16) ( which represent competitive and noncompetitive inhibition, respectively. Here 1 denotes the concentration of the substrate and 2 the concentration of an inhibitor. The data used in Bogacka et al. (2011) from an initial experiment of 120 observations are on Dextrometorphan-Sertraline and yields the estimates displayed in Table 1, where Gaussian errors were assumed. Assumed parameter spaces were not explicitly given there, but can be inferred from their figures as 0,1 , 1,1 ∈ (0, ∞), 0,2 , 1,2 ∈ (0, 60], and 0,3 , 1,3 ∈ (0, 30], respectively. Designs for parameter estimation in these models were recently given in Schorning, Dette, Kettelhake, and Möller (2017).
In Atkinson (2012), the two models are combined into an encompassing model: where = 1 corresponds to (15) and = 0 to (16), respectively. Following the ideas of Atkinson (1972) as used, for example, in Atkinson (2008) or Perrone, Rappold, and Müller (2017) one can then proceed to find so-called -optimal (i.e., D-optimal for only a subset of parameters) designs for and employ them for model discrimination. Note that also this method is not fully symmetric as it requires a nominal value for for linearization of (17), which induces some kind of weighting.
In Table 2 of Atkinson (2012) four approximate optimal designs (we will denote them as A1-A4) were presented: the −optimal designs assuming = 0 (A1) and = 1 (A4), a compound -optimal design (A3), and a -optimum (A2) for the encompassing model (for the latter note that Atkinson assumed = 0.8, whereas the estimate suggest a much higher value). We will compare our -optimal designs against exact versions of these designs, properly rounded by the method of Pukelsheim and Rieder (1992).

Confirmatory experiment = , normal errors
Let us first assume we want to complement the knowledge from our initial experiment by another experiment for which, however, we were given only limited resources, for example, for the sample sizes of only = 6 observations. Note that the aim is not to augment the previous 120 observations but to make a confirmatory decision just using the new observations. That is we are using the data from the initial experiment just to provide us with nominal values for parameter estimates and noise variances for the simulation, respectively. This is a realistic scenario if for instance for legal reasons the original data had to be deleted and only summary information was available.  As we are assuming equal variances for the two models we are using the estimate for the error standard deviation̂= 0.1526 from the encompassing model as a base value for the simulation error standard deviation. However, usinĝwas not very revealing for the discriminatory performance was consistently high for all designs. Thus, to accentuate the differences the actual standard deviation used was 2 ×̂instead (unfortunately an even higher inflation is not feasible as it would result in frequent negative observations leading to faulty ML-estimates). We then simulated the data-generating process under each model for = 10000 times and calculated the total percentages of correct discrimination (hit rates) when using the likelihood ratio as decision rule.
We are comparing the designs A1-A4 to three specific designs 1, 2, and 3, which represent a range of different nominal intervals. Specifically we choseΘ = [̃1 ±̃1] × [̃2 ±̃2] × [̃3 ±̃3] =0,1 , where we chosẽ=̂and̃=f or = 0, 1 and = 1, 2, 3. The tuning parameter was set to three levels: = 1 (which is close to the lower bound of still providing a regular design), = 5 and = 15 (which is sufficiently close to the theoretical upper bound to yield a stable design), respectively. To make the latter more precise: the models in considerations are such that if we fix the last two out of the three parameters, then they become one-parameter linear models. Therefore, using Proposition 2.4 we know that there exists a finite set upper bound * . Solving (14) provides the numerical value * ≈ 64.02. Note that the same bound is valid for all design sizes . Designs A1-A4 and 1 all contain four support points, while 2 has six and 3 has five, respectively. A graphical depiction of the designs is given in Figure 4.
Robustness study: As we would like to avoid comparing designs only when the data are generated from the nominal values (although this favors all designs equally), we perturbed the data-generating process by drawing parameters from uniform distributions drawn at̃± ×̃, where then acts as a perturbation parameter. Under these settings all these designs fare pretty well as can be seen from Table 3. However, A4 and 2 seem to outperform the other competing designs by usually narrow margins F I G U R E 5 Boxplot for the total correct classification rates for all designs using nominal values and error standard deviations of 5 ×̂; white under 0 , grey under 1 except perhaps for A1, which is consistently doing worst. Note that in a real situation the true competitors of -optimal designs are just A2 and A3 as it is unknown beforehand which model is true.

A second large-scale experiment = , log-normal errors
As the discriminatory power of all the designs for = 60 is nearly perfect, we are required to inflate the error variance. However, using additive normal errors in the data-generating process and inflating the variance by a large enough factor, would generate a large number of negative observations, which renders likelihood estimation invalid. So, the data-generating process was adapted to use multiplicative log-normal errors. The observations were then rescaled to match the means from the original process. This way we are at liberty to inflate the error variance by any factor without producing faulty observations. Note that now the datagenerating process does not fully match the assumptions under which the designs were generated, but this can just be considered an extended robustness study as it holds for all compared designs equally. We could of course also have calculated the designs under the same data-generating process, but as the fit of the model to the original data is not greatly improved and models (15) and (16) seem firmly established in the pharmacological literature, we refrained from doing this.
Perturbation of the parameters here did not exhibit a discernible effect, while the error inflation still does. For brevity, we here report only again the results for using 5 ×̂(and = 0). The respective designs 1-3 were qualitatively similar to those given in Figure 4 albeit with more diverse weights. In this simulation we generated 100 instances of = 60 observations from these designs a thousand times.
The corresponding boxplots of the correct classification rates are given in Figure 5. In this setting A4 seems a bit superior even under 1 (remember it being the -optimum design assuming 0 true), while 1 and 2 come close (and beat the true competitors A2 and A3) with A1 again being clearly the worst.

CONCLUSIONS AND DIRECTIONS FOR FURTHER RESEARCH
We have presented a novel design criterion for symmetric model discrimination. Its main advantage is that design computations, unlike for -optimality, can be undertaken with efficient routines of quadratic optimization that in general enhance the speed of computations by an order of magnitude. An optimal exact design problem is a problem of discrete optimization, and the efficiency of its solution critically depends on the speed of evaluation of the design criterion. By a series of approximations, we substituted the theoretically ideal but numerically infeasible computation of the probability of correct discrimination with a simple convex optimization, which can be solved rapidly and reliably. Combined with the proposed methodology of flexible nominal sets, we can construct an entire sequence of exact experimental designs efficient for discrimination between models. Also it was shown in an example that resulting designs are competitive in their actual discriminatory abilities.
The notion of flexible nominal sets may have independent merit. Note again the distinction between parametric spaces and flexible nominal sets (and thus the principal distinction to "rigid" minimax approaches). Parametric spaces usually encompass all theoretically possible values of the parameters, while flexible nominal sets can contain the unknown parameters with very high likelihood, and still be significantly smaller than the original parameter spaces. In this paper, we do not completely specify the process of constructing the flexible nominal sets, but if we perform a two stage experiment, with a second, discriminatory phase, the potential specification through confidence intervals is an important problem.
As the approach suggested offers a fundamentally new way of constructing discriminatory designs, many properties are yet unexplored. A nonexhaustive list of questions follows.
Sequential procedure. The proposed method lends itself naturally to a two-stage procedure, where parameter estimates and confidence intervals are employed as nominal values in the second stage. Even sequential generation of design points can be straightforwardly implemented.
Approximate designs. Proposition 2.1 is a possible gateway for the development of the standard approximate design theory for -optimality because the criterion 2 is concave on the set of all approximate designs. Therefore, it is possible to work out a minimax-type equivalence theorem for -optimal approximate designs, and use specific convex optimization methods to find a -optimal approximate designs numerically. For instance, it would be possible to employ methods analogous to Burclová and Pázman (2016) or Yue, Vandenberghe, and Wong (2018).
Utilization of the -optimal designs for related criteria. For a design  = ( 1 , … , ), a natural criterion closely related to -optimality can be defined as̃( The criterioñrequires a multivariate nonconvex optimization for the evaluation in each design , which entails possible numerical difficulties and a long time to compute an optimal design. However, the -optimal design, which can be computed rapidly and reliably, can serve as efficient initial design for the optimization of̃. Note that ifΘ 0 is a singleton containing only the nominal parameter value for Model 0, the -optimal designs could potentially be used as efficient initial designs for computing the exact version of the criterion of -optimality. Selection of the best design from a finite set of possible candidates. As most proposals for the construction of optimal experimental designs, the method depends on the choice of some tuning parameters or even on entire prior distributions (in the Bayesian approach), which always results in a set of possible designs. It would be interesting to develop a comprehensive Monte-Carlo methodology for the choice of the best design out of this pre-selected small set of candidate designs. A useful generalization of the rule would take into account possibly unequal losses for the wrong classification.
Noncuboid sets. The methodology could certainly be extended to other types of flexible nominal sets, particularly when we are interested in functional relations among the parameters. However, then the particularly efficient box constrained quadratic programming algorithm could not be utilized.
Higher order approximations. As a referee remarked it is possible to employ tighter approximations of the sets of mean values of responses than the one that we suggest. For instance, it would be possible to use the local curvature of the mean-value function. However, this may also lead to the loss of numerical efficiency of the method.
More than two rival models. Another referee remark leads us to point out the natural extension to investigate a weighted sum or the minimum over all paired comparisons. The implications of these suggestions, however, requires deeper investigations.
Different error variances. Yet another referee requested a clarification for how to proceed in case of unequal error variances for the two models. In case the functional form of these variances are known simple standardizations of the models will suffice. All other cases, including dependencies of the errors, will require more elaborate strategies.
Combination with other criteria. The proposed method can produce poor or even singular designs for estimating model parameters. Because of this problem, already mentioned in Atkinson and Fedorov (1975), Atkinson (2008) used a compound criterion called -optimality. The same approach is possible for -optimality. However, our numerical experience suggests that for a large enough size of the flexible nominal set, the -optimal designs tend to be supported on a set that is large enough for estimability of the parameters, without any combination with an auxiliary criterion. A detailed analysis goes beyond the scope of this paper.