Locally R-optimal designs for a class of nonlinear multiple regression models

This paper concerns with optimal designs for a wide class of nonlinear models with information driven by the linear predictor. The aim of this study is to generate an R-optimal design which minimizes the product of the main diagonal entries of the inverse of the Fisher information matrix at certain values of the parameters. An equivalence theorem for the locally R-optimal designs is provided in terms of the intensity function. Analytic solutions for the locally saturated R-optimal designs are derived for the models having linear predictors with and without intercept, respectively. The particle swarm optimization method has been employed to generate locally non-saturated R-optimal designs. Numerical examples are presented for illustration of the locally R-optimal designs for Poisson regression models and proportional hazards regression models.


Introduction
Generalized linear models (GLMs) have been used quite effectively in statistical modelling but the associated design issues are undoubtedly challenging, since the intensity function within their information matrices depends on the value of the linear predictor, which means that the optimal designs are related to the unknown parameters. Following Konstantinou et al. (2014), we note that the information matrix of the proportional hazards regression models used in survival analysis also has the same feature when type I and random censoring are considered, and their intensity functions, which are similar to Poisson regression models and negative binomial regression models, are strictly monotonic rather than a symmetric structure that appears in the logistic and probit models. In this paper, we focus on such models with monotonic intensity functions.
In this direction, there is increasing interest in determining optimal designs under various criteria, especially for models with multiple covariates. Initial work was done by Konstantinou et al. (2014) who provided analytical results of D-and c-optimal designs for this class of models with only one covariate. Subsequently, Schmidt and Schwabe (2015) extended the results concerning D-optimality to one-dimensional discrete design space. For multiple regression, Schmidt and Schwabe (2017) determined D-optimal designs by identifying a complete subclass, which contains the results of Russell et al. (2009) for the Poisson regression model as a special case. Radloff and Schwabe (2019) gave a construction method of D-optimal designs when the design region is a k-dimensional ball. Recently, Schmidt (2019) systematically characterized c-, L-and k -optimal designs for models with a single covariate and for multiple regression with an arbitrary number of covariates.
It should be noted that the linear predictor in the previously mentioned literature always includes an intercept term. The intercept in GLMs and proportional hazards regression models with censoring, respectively, characterizes the expected mean and expected survival time when all the explanatory variables are equal to zero. In this case, the linear predictor reflects the influence of all the unobserved fixed variables in these models. As pointed out by Idais and Schwabe (2021), when the intercept is significantly zero, i.e., the average impact of all the unobserved fixed variables is significantly zero, one may claim that the model includes probably most variables which explain the outcome. For the gamma models without intercept, Idais and Schwabe (2021) obtained some explicit solutions of D-and A-optimal designs in some multi-linear cases, including the two-factor model with interaction.
This paper aims to provide a characterization of R-optimality for multiple regression models with and without intercept. It is well known that the R-optimality criterion proposed by Dette (1997) has a nice statistical interpretation, namely minimizing the volume of the Bonferroni rectangular confidence region of the regression parameters. Moreover, it satisfies an extremely useful invariance property which allows an easy calculation of optimal designs on many linearly transformed design spaces. This optimality has been frequently applied to the cases of multi-response experiments, multi-factor experiments and mixture experiments, see, e.g., X. Liu and Yue (2020), P. Liu et al. (2022) and Hao et al. (2021) for some recent references.
In general, the dependence of designs on parameters whose values are unknown a priori occurs in nonlinear regression models. Hence, utilizing a pre-specified parameter value we can obtain the so-called locally optimal designs in accordance with Chernoff (1953). In this paper, we concentrate on the construction of locally optimal designs and take R-optimal designs to replace locally R-optimal designs for simplicity. Some general notations and a brief introduction of the R-criterion are presented in Section 2. In Sections 3 and 4, we analytically and numerically determine R-optimal designs for models having an intensity function which only depends on the value of the linear predictor with and without intercept, respectively. A brief discussion is given in Section 5. All proofs are included in the Appendix.

Model specification and R-optimality criterion
Throughout the paper, we focus on a class of nonlinear multiple regression models with information driven by the linear predictor defined on a given design region X , and consider the approximate designs ξ of the form Silvey, 1980, p. 15). The Fisher information matrix of ξ with independent observations is assumed to be where Q ≡ 0 is the intensity/efficiency function (see Fedorov, 1972, p. 39) which only depends on the value of the linear predictor, f is a p × 1 vector of known regression functions, and β ∈ R p denotes the vector of p unknown parameters. This kind of information matrix is common in the widely used generalized linear models, while it may also arise in other models such as the exponential regression models in proportional hazards parametrization with various censoring, including type I and random censoring (see Konstantinou et al., 2014) as well as other censoring distributions (see Schmidt & Schwabe, 2017). Following Konstantinou et al. (2014), we further assume that the intensity function Q satisfies the following conditions.
It is clear that the intensity functions induced by Poisson regression models, negative binomial regression models and proportional hazards regression models with type I and random censoring abide by all the conditions above.
In what follows, we concentrate on the R-optimality of designs which minimizes the product of the diagonal entries of the inverse of the Fisher information matrix. A design ξ * ∈ is called R-optimal if it minimizes over , where is the set of all designs with a non-singular information matrix on X , and e j,p denotes the jth unit vector in R p . An important tool in optimal design theory is equivalence theorems that not only provide a characterization of the optimal design but also are the basis of many algorithms for their numerical construction (see, e.g., Yang et al., 2013;Freise et al., 2021). The following result gives the equivalence theorem for R-optimality.
Theorem 2.1: A design ξ * ∈ is R-optimal if and only if holds for all x ∈ X . Moreover, it is equal at the support points of the design ξ * .
In Sections 3 and 4 the minimally supported designs (i.e., the so-called saturated designs) will appear as candidates for determining R-optimal designs. A design ξ = {x i ; ω i } m i=1 has minimal support if the number of support points is equal to the number of parameters, i.e., m = p. Let D ω = diag(ω 1 , . . . , ω m ),X = Q 1/2 X with X = (f (x 1 ), . . . , f (x m )) and Q 1/2 = diag( Q(f (x 1 ) β), · · · , Q(f (x m ) β)). Accordingly, the information matrix (1) for the design ξ can be decomposed as M(ξ , β) =X D ωX . Furthermore, if the design ξ is saturated and the regression vectors located at the rows of X are linearly independent, the following result exhibits the optimal weights of a design with minimal support for the R-optimality criterion.
Lemma 2.1 (Pukelsheim & Torsney, 1993): The R-optimal weights for a saturated design ξ are given by where s 11 , . . . , s pp are the diagonal entries of the matrix Remark 2.1: Since the diagonal entries s jj of the matrix S depend on the weights, a fixed point iteration procedure is available to determine a solution of optimal weights. In step r + 1, the weight ω (r+1) j is equivalent to

R-optimal designs for models with intercept
In this section, we will describe how to construct R-optimal designs for multiple regression models with information matrices of the form (1) and satisfying the assumptions (A1)-(A4). More precisely, multiple regression with additive linear effects of the covariates in the linear predictor that contains an intercept term will be considered. We consider the multi-linear case which has f = (1, x ) with x = (x 1 , . . . , x p−1 ) ∈ X ⊂ R p−1 and denote the parameter vector β = (β 0 , β 1 , . . . , β p−1 ) for convenience. Suppose that the design region X is a multidimensional polyhedron. By applying the complete class results described in Theorem 2 and Lemma 1 of Schmidt (2019), R-optimal designs can be found from the complete subclass, which has at most two support points on each edge of X when the hyperplanes H η = {x ∈ R p−1 : f (x) β = η} are assumed to be bounded on X for all η ∈ R, The following theorem provides an approach to generate R-optimal designs on the rectangular design region for the model under consideration, the proof of which can be established by using the similar arguments as in Theorem 9 in Schmidt (2019) It should be pointed out that using the same reasoning as in the proof of Theorems 8 and 9 in Schmidt (2019) the uniqueness of the solution of the common system of equations described in Theorem 3.1 follows. Here the case p = 2 with one covariate is also satisfied, so p 2.
. . , p) denote the elements of the matrix S in (5) evaluated at a design with support points a − (x 1 /β 1 )e 1,p−1 , . . . , a − (x p−1 /β p−1 )e p−1,p−1 and a = (a 1 , . . . , a p−1 ) . If a solution exists, let x * i ∈ (0, ∞) and the weights ω * i for i = 1, . . . , p − 1 be the unique solutions of the common system of Equations (4) and (6) given by Remark 3.1: Let X = [u 1 , v 1 ] × · · · × [u p−1 , v p−1 ], when some of the parameters β 1 , . . . , β p−1 are equal to zero. It follows from Lemma 1 of Schmidt (2019) that the two endpoints of the corresponding edges must be the support points of optimal design, and the number of support points is possibly over p. This means that we need to use some numerical algorithms to generate the R-optimal design, such as the commonly used particle swarm optimization (PSO) method, see Section 4.2 for details.
The following two examples illustrate the results in Theorem 3.1 and Remark 3.1, and exhibit the performance of different designs in terms of the Bonferroni confidence intervals and the relative efficiencies of designs. In Example 3.1 the first-order Poisson regression model with intercept will be considered, and the difference of the averaged Bonferroni confidence intervals derived by R-and D-optimal designs as well as the balanced design ξ b will be investigated by a numerical simulation. Example 3.2 considers the Poisson regression model with two covariants which was discussed in Schmidt (2019), and the relative efficiency of R-optimal designs will be compared with other designs. In both of the examples, the intensity function Q is given by Example 3.1: Consider the first-order Poisson regression model with intercept and let X = [0, 5]. If we set β = (6, −1) and β = (1, 1) , the R-and D-optimal designs can be calculated numerically by Theorem 3.1 and Theorem 2 of Konstantinou et al. (2014). For example, under the R-optimality criterion with β = (6, −1) we obtain x * 1 = 2.1886 and ω * 2 = 0.5431 by solving the Equations (4) and (6). In Table 1, the results on optimal designs for both cases are reported. Furthermore, we carry out a simulation study in order to assess the difference of the Bonferroni confidence intervals derived by different designs. The averaged Bonferroni confidence intervals under different sample sizes are calculated by 10,000 simulation runs, which are shown in Table 1. We find from Table 1 that for both cases the coverage probabilities perform in a similar manner and much close to 0.95 as the sample size increases, and the width of the averaged Bonferroni confidence intervals obtained from R-optimal designs is narrower comparing with D-optimal designs and the balanced design ξ b when there exists a serious loss of R-efficiency. (4) and (6). The R-optimal design is given in Table 2. The PSO method was used to find the corresponding D-, R-, and A-optimal designs when β = (0, −1, 0) . The resulting designs are listed in Table 2. In this case, we find that the endpoints (0, 0) and (0, 5) must be the support points of optimal designs. Figure 1 displays the plot of the function φ(x, β) defined in (3) for the cases β = (0, −1, −1) and β = (0, −1, 0) , which indicates that the function φ attains its maximum 3 at each support point.

Example 3.2: Assume that the linear predictor in the considered Poisson regression model includes an intercept term. Let
To compare the performance of different designs, such as the common D-and A-optimality, we may calculate the related efficiency that usually defines the value of the criterion function for the optimal design relative to the value of the criterion function of a design and can thus take values between 0 and 1. For instance, the R-efficiency is defined as Eff R (ξ ) = ψ(ξ * , β)/ψ(ξ , β). The results are summarized in Table 2. For the case β = (0, −1, −1) , all the designs have relatively high efficiencies regarding the other optimality criteria. This occurs primarily because the designs have a similar structure. For the case β = (0, −1, 0) , we observe that the R-optimal design still has relatively high D-and A-efficiencies, but the A-optimal design has a large loss of R-efficiency.

Remark 3.2:
The optimal weights of saturated designs for L-and k -optimality criteria, except for D-optimality, depend on the parameters (β 1 , . . . , β p−1 ) in the same settings of Theorem 3.2.
To further illustrate Theorem 3.2, we consider the proportional hazards regression models with two types of censoring. Under type I censoring with a fixed censoring time c the intensity function Q is given by Q if the censoring times are assumed to follow a uniform distribution U(0, c) (see Konstantinou et al., 2014). For both the above-mentioned intensity functions the variable θ belongs to R.

Example 3.3:
For the proportional hazards regression models with type I and random censoring Schmidt (2019) discussed the behaviour of c-and k -optimal designs for different parameter values. Their results are consistent with Remark 3.2. Analogous to Schmidt (2019), we investigate the performance of R-optimal designs on the design region X = [0, 3] p−1 by comparing with a balanced design, say ξ b , which is supported at all vertices of X with equal weights. First, the censoring time c as an unknown parameter should be determined previously, which is closely 0.9603 † The abbreviation 'BCI' represents the Bonferroni confidence intervals of the parameters β 0 and β 1 , where the first row under each design is the simulation results of β 0 , and the second row is β 1 . ‡ The abbreviation 'CP' represents the coverage probabilities of 95% confidence intervals for the parameters β 0 and β 1 . Table 2. Comparison of R-optimal design with Dand A-optimal designs on X = [0, 5] 2 for the Poisson regression model discussed in Example 3.2.  , β))/det(M(ξ * , β))} 1/p and Eff A (ξ ) = tr(M −1 (ξ * , β))/tr(M −1 (ξ , β)), respectively, where det(·) and tr(·) are the matrix determinant and trace functions.
related to the amount of censoring q. Here q is also called the overall probability of censoring and given by q = 1 − ω i P(Y j < C), in which Y i is the survival time and C is the censoring distribution (see Kalish & Harrington, 1988). For the two-covariate case, we choose c = 32 for type I censoring and c = 69 for random censoring such that q is, respectively, equal to 60% for β = (3, −2.5, −2.5) , 71% for β = (3, −3, −3) and 75% for β = (3, −4, −4)  when the balanced design ξ b on X = [0, 3] 2 had been used. The R-optimal designs on X = [0, 3] 2 and R-efficiencies of ξ b are summarized in Table 3. We observe from Table 3 that the R-optimal designs shift the support points on the edges towards the vertex (0, 0) and the R-efficiencies of the balanced design ξ b decrease with increasing amounts of censoring. Note also that the R-efficiency of ξ b is quite low for these censoring scenarios. In addition, the R-optimal weights are the same for each model by Theorem 3.2.

R-optimal designs for models without intercept
In the present section we turn to discuss the multi-linear case without intercept, where the vector of the regressor functions is specified as f = (x 1 , . . . , x p ) ∈ X , p 2, and the parameter vector is denoted by β = (β 1 , . . . , β p ) . The regression model without an intercept is common and it usually arises from the physical characteristics of the variables measured. The design issue for this model has attracted considerable attention in the literature (see, e.g., Idais & Schwabe, 2021;Li et al., 2005 and the references cited there).
Then the support points of R-optimal designs are also at the edges of X by Theorem 2 in Schmidt (2019). Moreover, from the proof of Theorem 8 in Schmidt (2019) the support points of an R-optimal design ξ * must be given by a − (x 1 /β 1 )e 1,p−1 , . . . , a − (x p /β p )e p,p and a = (a 1 , . . . , a p ) with a i = v i if β i > 0 and a i = u i if β i < 0 for i = 1, . . . , p. As a result, the optimality condition for approximate designs in Theorem 2.1 can be simplified to verify whether it is satisfied at the boundary of the design region, i.e., the design ξ * is R-optimal if and only if holds for all x i , i = 1, . . . , p, where the function i (x i , ξ * ) is given by

Some theoretical results on saturated designs
In this subsection, we will provide two types of saturated R-optimal designs for the model considered by distinguishing whether the vertex a is a support point. The first result reveals designs which include the support point a, where the condition a τ = 0, τ ∈ {1, . . . , p}, described in Theorem 4.1 is to guarantee the design ξ * τ in (9) with a non-singular information matrix.
Theorem 4.1: Let the assumptions (A1)-(A4) be satisfied for models with information matrices of the form (1) and without intercept. Let s ij (i, j = 1, . . . , p) be the elements of the matrix S in (5) evaluated at a design of the form . For a given index τ (τ ∈ {1, . . . , p}) with a τ = 0, if a solution exists, let x * i ∈ (0, ∞) and the weights ω * i be the unique solutions of the common system of Equations (4) and (8) given by (7) and φ(·, ·) as in (3).
It is easily verified that the condition h 2 (x 2 , ξ * 2 ) < 0 is satisfied for x 2 ∈ [0, 5) and the design ξ * 2 is then R-optimal (see also Figure 2(a)). If we choose β = (1, 1) , however, it follows from Theorem 4.1 that the designs ξ * 1 and ξ * 2 are given by respectively. In this case, the conditions h i (x i , ξ * i ) < 0, i = 1, 2, are not satisfied, which means that both designs are not R-optimal. Accordingly, finding a saturated R-optimal design for this model that does not contain the support point a may be an alternative scheme. This type of R-optimal design will be elaborated in Theorem 4.2.

Theorem 4.2:
Let the assumptions (A1)-(A4) be satisfied for models with information matrices of the form (1) and without intercept. Let s ij (i, j = 1, . . . , p) be the elements of the matrix S in (5) evaluated at a design of the form . If a solution exists, let x * i ∈ (0, ∞) and the weights ω * i be the unique solutions of the common system of Equations (4) and (10) given by for all i, where z = (z 1 , . . . , z p ) with z j = a j β j /x j , will be R-optimal on X provided that the condition h i (a i , ξ * ) < 0 (or φ(a, β) < p) is satisfied.
Corollary 4.1: Let X = [0, ∞) p and let the assumptions (A1)-(A4) be satisfied for models with information matrices of the form (1) and without intercept. Then the following design ξ * is R-optimal for each β i < 0, i = 1, . . . , p, where the function ψ(·) is defined as Remark 4.1: It is worthwhile mentioning that the design (12) shown in Corollary 4.1 is also D-and A-optimal.

Remark 4.2:
For the case of p = 1, the one point design ξ * = {a; 1} that simultaneously satisfies the additional conditions in Theorem 4.1 is saturated R-optimal. However, if the design ξ * = {a; 1} is not R-optimal, the proposed method in Theorem 4.2 can be used to search for a saturated R-optimal design.

PSO-generated R-optimal designs
Only finding saturated designs may not be enough for determining the R-optimality of a design in the design class , since the optimal designs depend on the unknown parameters. For example, let β = (0.5, 0.5) for the Poisson regression model discussed in Example 4.1. The R-optimal design (see Figure 3 This means that the number of support points of R-optimal designs for models without intercept may exceed the number of regression parameters. Thereby, the aforementioned method by solving equations is unable to determine an R-optimal design in and we require an effective algorithm to generate an R-optimal design. Here we employ the PSO algorithm to find optimal designs for the models under consideration and a pseudo code of PSO is described as below. In Section 4.2, the support points of each position ξ i must be given by a − (x 1 /β 1 )e 1,p−1 , . . . , a − (x p /β p )e p,p and a.
The index t is equal to 0, 1, . . ., and two criteria can be used to end iteration, achieving a maximum number of iterations or verifying whether the equivalence condition attains a pre-specified threshold.
The notations v (t) i and ξ (t) i are, respectively, the current velocity and position for the ith particle. θ t is the inertia weight that modulates the influence of the former velocity, which can be a constant or a decreasing function with  1, . . . , n) and personal best criterion values as well as ξ gbest and global best criterion value.
return ξ gbest and global best criterion value.
values between 0 and 1. α 1 and α 2 are both random variables from U(0, 1). γ 1 and γ 2 are two constants reflecting the cognitive learning level and social learning level, respectively.

Example 4.3:
Consider the proportional hazards regression models with three covariates, in which the linear predictor does not include an intercept term. Let X = [0, 3] 3 and β = (−2.5, 0.5, 0.5) . In order to assess the effect of the amount of censoring q on R-optimal designs, we adjust the censoring time c to achieve overall censoring probabilities of 20%, 40%, 60% and 80% for the balanced design ξ b . For instance, we choose c = 60 for type I censoring and c = 133 for random censoring when q = 0.4. In this example, the PSO algorithm with 150 particles and 100 iterations is able to find the R-optimal design with the required accuracy, which can be implemented by R software in less than 20 seconds on a standard PC. Table 4 summarizes the numerical results, including R-efficiencies of the balanced design ξ b and the amount of censoring q * under the corresponding R-optimal design. Some of the points are clear from the numerical results.
• For the three-covariate case, the support points of R-optimal designs for type I censoring and random censoring exceed the number of parameters. • With increasing amounts of censoring, the R-optimal designs shift the support point on the edge (x 1 , 3, 3) towards the vertex a = (0, 3, 3) , and the R-efficiency of the balanced design ξ b is reduced gradually. • The overall probability of censoring q * under the R-optimal design is less than for the balanced design ξ b .

Discussion
The present paper investigates the construction of locally R-optimal designs for a large class of nonlinear multiple regression models. For the case of models with intercept, the R-optimal designs on a rectangular design region can be determined by utilizing the similar arguments in Schmidt (2019) but finding its optimal weight is different. We notice that the structure of the R-optimal designs is similar to those criteria reported in Schmidt (2019), especially in terms of the location of support points. For the case of models without intercept, however, with the same method we can determine the saturated R-optimal designs only from two design subclasses addressed in Section 4.1. Moreover, Table 4. The locally R-optimal designs on X = [0, 3] 3 for β = (−2.5, 0.5, 0.5) in the proportional hazards regression models, the R-efficiencies of the balanced design ξ b and the overall probability of censoring under the R-optimal design. q R -optimal design for type I censoring  Table 5. The R-efficiencies of the locally R-optimal designs on X = [0, 5] p−1 for various misspecified β for the first-order Poisson regression models with intercept. the PSO algorithm has been used to generate non-saturated R-optimal designs for both cases. Some conditions in Theorems 3.1, 4.1 and 4.2 are required, which ensure that the support points are located within the design region. If these conditions are not satisfied, optimal designs may then be having more support points. In addition, a nonlinear system of equations must be solved numerically in order to search for the saturated R-optimal designs. Although the existence of the solution to these equations is not proved theoretically, the numerical exploration shows that the solution in all considered examples always exists. It is worthwhile mentioning that the locally optimal designs discussed so far are derived for a given value of the model parameter vector β. One might choose such a value of β from an initial guess or estimation when some historical observations can be obtained. It might be of interest to study how the design will be affected by wrongly specified parameters. For illustration, we consider the locally R-optimal designs for the first-order Poisson regression models with an intercept on the design region X = [0, 5] p−1 for p = 2, 3, 4, respectively. In specific, we assume that the true parameter vector is β = (0, −1) for p = 2, β = (0, −1, −1) for p = 3, and β = (0, −1, −1, −1) for p = 4. We generate the locally R-optimal designs for various misspecified values of β, and calculate their Refficiencies with respect to the locally R-optimal design for the true parameter vector for each p = 2, 3, 4, which are shown in Table 5. It is observed from Table 5 that the efficiency of the locally R-optimal design for the misspecified value of β decreases as the value of β diverges from its true value, and the loss of R-efficiency is disastrous when the difference between the true value and the misspecified value of β is relatively serious. Numerical results with other examples yield similar conclusions, which are not reported here for the sake of saving space.
To overcome the parameter dependence of the locally optimal design, a commonly used approach to the computation of locally optimal designs is weighted designs (see, e.g., Atkinson et al., 2007, Chap. 18), where a prior distribution for β, which may be either discrete or continuous, is assumed in advance. Another method is the computation of maximin efficient designs, i.e., maximizing the minimal efficiency with respect to the parameters (see Dette, 1997;Konstantinou et al., 2014).

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
Lei He's work is supported by the National Natural Science Foundation of China [Grant Number 12101013] and the Natural Science Foundation of Anhui Province [Grant Number 2008085QA15]. Rong-Xian Yue's work is supported by the National Natural Science Foundation of China [Grant Numbers 11971318, 11871143].
In order to prove the following Theorems 4.1 and 4.2, the extended design region with x i ∈ (−∞, v i ] for β i > 0 and x i ∈ [u i , ∞) for β i < 0 will be considered.