Locally D-optimal Designs with Heteroscedasticity: a Comparison between Two Methodologies

The classic theory of optimal experimental designs assumes that the errors of the model are independent and have a normal distribution with constant variance. However, the assumption of homogeneity of variance is not always satisfied. For example when the variability of the response is a function of the mean, it is probably that a heterogeneity model be more adequate than a homogeneous one. To solve this problem there are two methods: The first one consists of incorporating a function which models the error variance in the model, the second one is to apply some of the Box-Cox transformations to both sides on the nonlinear regression model to achieve a homoscedastic model (Carroll & Ruppert 1988, Chapter 4). In both cases it is possible to find the optimal design but the problem becomes more complex because it is necessary to find an expression for the Fisher information matrix of the model. In this paper we present the two mentioned methodologies for the D-optimality criteria and we show a result which is useful to find D-optimal designs for heteroscedastic models when the variance of the response is a function of the mean. Then we apply both methods with an example, where the model is nonlinear and the variance is not constant. Finally we find the D-optimal designs with each methodology, calculate the efficiencies and evaluate the goodness of fit of the obtained designs via simulations. Palabras clave: D-eficiencia, Diseños D-óptimos, heterocedasticidad, trans-formación de Box-Cox.


Introduction
The optimal experimental designs are a tool that allows the researcher to know which factor levels should be experimented in order to obtain a best estimate of the parameters of the model with certain statistical criterion.One of the most popular criteria is the D-optimality which involves finding the design that minimizes the generalized variance of the parameter vector.The design depends on a regression model (1) that relates the response variable Y with the independent variable x Y = η(x, β) + (1) with η(x, β) a linear or nonlinear function of the parameter vector β and x.
Besides, if the researcher has the possibility to run N observations of the model (1), then there are the following assumptions: 1. the error components i , for i = 1, 2, . . ., N , are independent and 2. have a normal distribution with constant variance σ 2 .
For more information about the classic theory of optimal designs see Kiefer (1959), O'Brien & Funk (2003), Atkinson, Donev & Tobias (2007, Chapter 9), López & Ramos (2007).However, in practice there are cases where the homogeneity assumption is not satisfied.For example when the variance of the response is a function of the mean, it can increase or decrease depending of the structure of the variance.The issue of heteroscedasticity in nonlinear regression models is discussed in detail in Seber & Wild (1989, pp. 68-72).Basically there are two methodologies to handle this problem.The first one is to apply some of the Box-Cox transformations to the model (1) with an appropriate λ 1 that stabilizes the variance of the errors.We identify the transformed model like model A: where is assumed that the new errors * i have a normal distribution with constant variance.
The second model, which we identify as model B, consists of incorporating the variance structure of the errors in the model as follows: where the errors i are independent N (0, σ 2 (η(x i , β)) λ2 ), with λ 2 an adequate power parameter that models the variance of the errors.
As Seber & Wild (1989) emphasize, the difference between models A and B is "that model A transforms so that y (λ1) has a different distribution from y as well as having a homogeneous variance structure, while model B models the variance heterogeneity but leaves the distribution of y unchanged".Also, the authors affirm that model B has often been preferred to model A when the deterministic function is linear, whereas models like A have been preferred in nonlinear models.Now, in the context of optimal designs when the model has heteroscedasticity, the problem to find D-optimal designs is more complicated than in the homogeneous case, because the D-optimality criterion maximizes the determinant of the Fisher information matrix of the model and the expression of this matrix changes when the variance is not constant.Because the information matrix depends of the model used, the two methodologies mentioned before for handling of heteroscedasticity are traditionally applied in separate ways.For example, Atkinson & Cook (1997) apply some of the Box-Cox transformations that makes the transformed model be linear with a constant variance and then they find local and Bayesian D-optimal designs to several models.On the other hand, in the case of linear models, Atkinson & Cook (1995) find local D-optimal designs for heteroscedastic linear models for various structures of variance, one of them is when the logarithm of variance is a linear function of the independent variable.Other authors have worked with nonlinear models, see for example Dette & Wong (1999).
In this paper we compare the methodologies mentioned above, analyze the structure of the information matrix and we find the D-optimal design for a specific model.Finally we compare the designs obtained through the D-efficiency.This paper is divided in four sections.In section 2 we present a brief summary of both methodologies for the D-optimality criterion and show a result which is useful to find D-optimal designs for heteroscedastic models when the variance of the response is a function of the mean (we omit the proof due to length constraints).In section 3 we illustrate both methods with an example and we compare results using the D-efficiency of each design.Then, we simulate observations of the model for each design and we calculate the relative error and mean square error.Finally, in section 4 we present some conclusions, discussions and suggestions.

Methodologies
Starting with a regression model of the form (1) with the usual assumptions, the problem of optimal designs consists to find the levels of the x factor where the researcher should experiment to obtain a best estimate of the parameters of the model under certain statistical criterion.In this paper we focus on the Doptimality criterion, which finds the design that minimizes the generalized variance of the parameter vector (Atkinson et al. 2007, pp. 135).More precisely, a design ξ is defined as a measure of probability with finite support denoted by: where n is the number of support points, x 1 , x 2 , . . ., x n are the support points of the design with associated weights w i ≥ 0 and such that & Funk 2003).If the weight w i is any number between 0 and 1, the design ξ is known as a continuous design.However in practice all designs are exact.This means that the weights w i are associated with the frequency of the support points (Atkinson et al. 2007, pp. 120).Now, the main problem of optimal designs is to find a design ξ over a compact region χ, that maximizes a functional of the information matrix M (ξ).This matrix plays an important role in the theory of optimal experimental designs.The structure of this matrix depends on the linear nature of the model and on the assumptions about the errors.When the variance of the errors is constant, this matrix has a known expression, see for example López & Ramos (2007).However, in the case of heteroscedastic models this expression is more complex and depends of the methodology applied.So, in the next two sections we analyze the structure of the Fisher information matrix with each of the two methodologies mentioned before.

Variance Modelling
When the variance of the error is not constant, one way to solve this problem is to find an adequate function which models the error variance and incorporate it in the regression model.There are many ways to do this, see for example Huet, Bouvier, Poursat & Jolivet (2004, pp. 65) and Seber & Wild (1989, pp. 68-72).One form is when the variance of the response is a power function of the mean: where σ 2 is the constant variance, τ is an unknown parameter and it should be estimated.The model (5) with variance structure is known as the power of the mean variance model (Ritz & Streibig 2008, pp. 74).
Now, some authors have worked the problem to find D-optimal designs modeling the variance.For example Dette & Wong (1999) find D-optimal designs for the Michaelis-Menten model when the variance is a function of the mean and Atkinson & Cook (1995) find D-optimal designs for heteroscedastic linear models.The following result is taken from Downing, Fedorov & Leonov (2001); they show the expression of the information matrix for a more general model than the power of the mean variance model (5): where θ is the parameter vector and it can include the parameters of the deterministic function η and those of the function S(x, θ) as a positive function used to model the variance of the error.Observe that the power of the mean variance model ( 5) is a nested model of the more general model ( 6).In this case the parameter vector θ includes all the possible parameters of the model: β, τ and σ 2 .So, results about the general model (for instance the next theorem)can be applied in particular for the power of the mean variance model.
, where S(x, θ) > 0 is a positive function, θ q×q is the parameter vector and χ a compact set.If the N observations {y i , x i } N i=1 are independent, then the Fisher information matrix for the approximate design ξ over the regression design χ is where This theorem is the main tool of this methodology, because it allows the researcher many ways of modelling the variance and incorporate it in the model.
Corollary 1.For the power of the mean variance model given in (1), where the errors are independent and have normal distribution with mean zero and variance var( i ) = σ 2 (η(x i , β)) 2τ with β, τ and σ 2 parameters, the information matrix is given by where Revista Colombiana de Estadística 37 (2014) 95-110 and for i = 1, 2, . . ., n: This result is the key at the construction of D-optimal designs and can be implemented computationally to obtain the designs.We will illustrate the use of this corollary with an application in the next section.But before we need the following important result which is one of the equivalence theorems.This theorem allows to verify if the obtained design is in fact the optimal design (Kiefer & Wolfowitz 1960) Theorem 2. D-optimality equivalence theorem.
Let M (ξ, θ) q×q the information matrix of the design ξ positive, Ψ(ξ, θ) = log |M (ξ, θ)| the D-optimality criterion and χ a compact set.Then the design ξ * is D-optimal if the directional derivative of φ in ξ * on the direction of ξ x holds where This result is useful to verify the D-optimality of a design ξ * , because one can plot the directional derivative φ(M (ξ * , θ), M (ξ x , θ)) over x ∈ χ and to check that this function at most zero over all experimental region (χ) and also that in the support points of the design, the equality holds.

Transformation of the Model
The second methodology consists of applying an adequate transformation on the model to obtain constant variance.We focus on the Box-Cox transformations, which are given by Box & Cox (1964).
The value of the parameter λ usually is unknown, but in some cases it can be assessed depending on the response.For instance, if the response is a volume, the appropriate transformation can be the cube root (λ = 1/3) and the square root if the response corresponds to count data (Atkinson & Cook 1997).Now, Atkinson & Cook (1997) find D-optimal designs when a Box-Cox transformation is applied, the resulting model is linear Revista Colombiana de Estadística 37 (2014) 95-110 and the errors have normal distribution with constant variance * ∼ N (0, σ 2 ).
However, as illustrated in the example and since the original model is nonlinear, we must find some appropriate λ such that when we apply the transformation to both sides of the model, the transformed model is linear in the parameters.It is important to observe that in our case, the parameter λ will be known, which is an advantage, because we do not need to estimate this parameter.However, when the parameter λ is unknown, it is possible to find the design, see for example Atkinson (2003) for more details.
Then, the authors show that the information matrix over the design region for the transformed model is (see the details in Atkinson & Cook 1997) where the symmetric matrix I(θ) is given by denote the first and second derivative respect to λ and are given by: However, these expressions have to be approximated using first-order Taylor approximations, since the expected values can not be calculated exactly.Finally, once the design is found using the above expressions, is necessary verify the Doptimality of the design using a similar result of the equivalence theorem 2.

Example
In Section 2 we described the two methodologies commonly used to handle the heteroscedasticity of a model.Now we illustrate these methods with one example.

PCB Model
The example consists of a study realized in 1972 in Lake Cayuga, New York, where the concentrations of Polychlorinated biphenyls (PCB) were made in a group of 28 trout at several ages in years."The ages of the fish were accurately known because the fish are annually stocked as yearlings and distinctly marked as to year class" (Bates & Watts 1988, pp. 267-268).The data taken from Bates & Watts (1988), are shown in the table 1 and the scatter plot is shown in figure 1.The plot of the data shows that the concentration of Polychlorinated biphenyls (PCB) increases when the age of the trout does.Also, the relationship between the variables clearly is not linear, so we propose to fit the nonlinear model: with β 1 , β 2 are unknown parameters to be estimated.Now, we are going to find the D optimal design for this model with the two methodologies described above.Because our designs are local, we use the data only with the purpose to have a good local value of the parameter vector.

Variance Modelling
First, we apply the methodology consisting on modelling the variance of the errors with an appropriate function.In figure 1, we see that the variability of the concentration increases as a power function of the mean, so we propose to fit the model ( 21) with variance structure (5) with τ an unknown parameter to be estimated.Now, we fit the model ( 22) in R Development Core Team (2013) and we used the gnls function for the generalized nonlinear least squares method.The results of the estimation are showed in table 2.  The conclusion from this test that is the parameter τ = 0, e.g. the model with variance structure ( 22) is better than the model with constant variance ( 21) with a signification level of 1%.

D-Optimal Design
Now, we find the D-optimal design for the model with variance structure ( 22).Because we work with local designs, we use the estimation of the parameters obtained previously like the local value for θ; that is, we use the local value θ 0 = (β 1 , β 2 , τ, σ) = (0.91, 0.31, 1.19, 0.34).Then we implement the corollary 1 through an algorithm in R Development Core Team (2013) and minimize − log(|M (ξ, θ)|).In this optimization problem we use the function nlminb over the experimental region (χ).The local D-optimal design obtained is shown in table 4 and is denoted by ξ D .The x i are the support points of the design and the w i the weights.As we can see, even though the model with variance structure (22) has four parameters to be estimated, the design consists only of two points, which are the extreme points of the regression range χ = [1,12].In this sense, if we could repeat the experiment and our objective are to estimate the parameters with minimum variance, then we measure the Polychlorinated biphenyls concentration in trout with ages of one and twelve and with equal number of replicates.
Then we check that the obtained design ξ D is D-optimal.With this in mind, by the D-optimality equivalence in theorem 2, we must verify that the directional derivative of Ψ at ξ D in the direction of the design that puts all mass at x, ξ x , As we can see in figure 2, this condition holds and the derivative equals zero at the support points, so the design ξ D is indeed D-optimal.

Simulations
We simulate 1,000 times 28 observations of the model with variance structure taking the values of x i like the support points of the design ξ D .Then we simulate the errors i ∼ N (0, σ 2 (β 1 e β2 ) 2τ ) for i = 1, 2, . . ., 28; use the estimations obtained in table 2 like the values of the parameters and with the model ( 24), we calculate the response y i s.Then, with these simulated data, we obtain the estimated parameter θ and calculate the relative and mean square error (RE and MSE respectively).We repeat this process 1,000 times and summarize it in table 5, showing the descriptive measures for both errors.This table shows the mean, median, range and standard deviation for the MSE of each parameter of the model ( 24) and the relative error in percentage RE(θ)×100%.For the parameter vector θ we propose an overall discrepancy measure, ODM, defined as ODM ( θ) = ||θ − θ|| 2 .From this table, we see that the central tendency measures for the MSE are small as the variability between the simulations.Also, the mean and median for the RE are very close to 10%.In general, all these measures indicate that the local design ξ D provides good parameter estimates, even though the design only has two experimental points and the model four parameters.

Efficiencies
Finally, we show the robustness of the design ξ D with respect to the choice of the local value θ 0 , through the D-efficiency of any design ξ: where p is the number of parameters of the model and M (ξ) denotes the information matrix of the design, where ξ is another design obtained with another local values of parameter vector.With this in mind, we perturb each one of the four parameters of the model ( 22) in a percentage ∆: Since the model has four parameters and each one can be perturbed at left, at right or not be perturbed; it is clear that the total number of perturbations is 3 4 = 81.Then each one of these perturbations will give us a design ξ and with (25) we calculate how far we are of the local D-optimal design.Then for a fixed ∆ = 0.6 (we could used another), we obtain 81 designs and for each one we calculate the respective D-efficiency.However, because most of these designs were equal to the two point design ξ D , we only show in table 9 (see the appendix) the support points, the weights and the D-efficiency of the 36 designs that were different to the optimal.Figure 3 summarizes the results of the efficiencies and shows that the design ξ D is robust respect the choice of the local value θ 0 , because the D-efficiencies are high (at least 0.80).

Transformation of the Model
Previously we apply the first methodology of variance modelling and find the local D-optimal design.Now we use the second methodology, that consists on applying an adequate transformation on the model.As we described in section 2.2, this transformation should be such that the transformed model is linear and homoscedastic.In this case as the model ( 1) is exponential, the appropriate Box-Cox transformation consists on applying logarithm to both sides: or equivalently in the form: where Y * = log Y , β * 1 = log β 1 and the new errors * are normal with constant variance.Then we fitted the linear model ( 28) and we obtained the estimations β * = (0.03, 0.26) T , σ = 0.57, and then β = (e 0.03 , 0.26) T = (1.03,0.26) T .Finally, we implemented an algorithm in R Development Core Team (2013) to find the information matrix with λ = 0 and to obtain the design that minimizes − log M (ξ, θ) over χ = [1,12].The resulting design in table 6, shows that in this case the design is the same obtained with first methodology.However, we have to point out that despite that the resulting design is the same with both methodologies, it is attributed to the fact that with each method we used the best local value for the parameter θ and as we saw when we calculate the D-efficiencies, the design can have three support points depending on the local value used.

Simulations
Analogously to the first methodology, we simulated 28 observations of the model ( 28).The results of the 1, 000 simulations are summarized in table 7.This is similar to the table 5 and shows the mean, median, range and standard deviation for the MSE of each parameter of the model ( 27) and relative error in percentage RE(θ) × 100%.For the parameter vector θ we propose a measure defined as ODM ( θ) = ||θ − θ|| 2 , which is a kind of square distance between the estimated parameter and the original.The conclusions from these results are similar as the obtained with the first methodology, although when we compare the measures for the relative error (RE), is noteworthy that all the descriptive measures are almost three times the correspondent to the first methodology.But in general, all these measures indicate that the local design ξ D fits well the model.

Efficiencies
Finally, we obtain the D-efficiencies following the same procedure described in section 3.1.4.In this case because we perturb three parameters: β 1 , β 2 and σ, we only have 3 3 = 27 combinations (the parameter λ = 0).But again most of all these designs were equal to the D-optimal design, so we only show in table 8 the six designs that correspond to a perturbation ∆ = 60% and were different to the optimal.In this table we use the symbols −, + or 0 to indicate the specific combinations of the parameters.
For instance, the first design is obtained when we disturb 60% to the left (−) the parameters β 1 and σ and we do not perturb (0) the parameter β 2 .Then the support points for this design are 1, 6.5 and 12 and the D-efficiency of the design is 0.93.It indicates that if we use this design instead of the unperturbed D-optimal design, we would need around 7% more observations to obtain the same efficiency that the D-optimal.Even more, it is remarkable that all six designs have exactly 3 support points: The extremes of the interval [1, 12] and the middle point 6.5.The only difference between these designs is the weight (in parentheses with two decimal places) and the D-efficiency, that can be 0.89 or 0.93, but in both cases it is high, so we can conclude that the D-optimal design is robust respect to the choice of the local value θ 0 .

Conclusions
We have presented a brief summary of two methodologies that can be implemented to find D-optimal designs when the model under study presents heteroscedasticity.In both cases the main problem is to find an expression for the Fisher information matrix of the model.We have illustrated both methods with Lake Cayuga data from which clearly do not have constant variance.However, there is an important difference between the methods that applied: The variance modelling methodology has the assumption that the errors of the original model has a normal distribution.However, the second methodology only requires a normal distribution for the transformed model, not the original.This can be an advantage of this methodology compared with variance modelling.
Under both methods, we find the same D-optimal design with two support points and with equal weights.But this fact is attributed only to the local values used in an independent way, that in this case were the estimations of the parameter vector using the data.Because the optimal design is local, we determine the robustness of this design with each methodology by disturbing the parameters of the corresponding model and calculating the D-efficiency of the obtained designs.In both cases, the efficiencies were high indicating that the D-optimal design is a robust design respect the choice of the local value θ 0 .Also, with each methodology we simulate 1, 000 observations of the model and calculate some descriptive measures for the relative and mean square errors.The results were similar.The only important difference is that measures for the relative errors of the second methodology were almost three times the correspondent to the first methodology.We cannot conclude which methodology is better because each one has its pros and shortcoming, with the example we obtained similar results.
Finally, we want to point out that we have not study two potential problems: First, the problem of heteroscedasticity for G optimality criterion and second, the problem of nonnormality (for D-optimality or not).Respect to the former, further work includes finding optimal designs for heteroscedastic models with another optimality criteria different to D-optimality.For instance, Wong & Cook (1993) have worked with G-optimal designs with linear models when the variance of the errors is incorporated in the model.With non normality, we did not find too many published papers, so this can be an interesting problem to work.Finally we have found local designs, but other option is to use the Bayesian approach.

Figure 1 :
Figure 1: Scatter plot of Lake Cayuga data.

Figure 2 :
Figure 2: Plot of the directional derivative.

Table 2 :
Generalized nonlinear least squares estimation.Next, we perform the likelihood ratio test to determine if the model with variance structure (22) is better than the model with constant variance (21).The results of the test are showed in table 3 (the model 1 corresponds to the model with variance structure (22) and the model 2 to the model with constant variance (21) ).

Table 3 :
ANOVA for the likelihood ratio test.

Table 5 :
Simulations with variance modelling (Std denotes the Standard deviation).

Table 7 :
Simulations for the logarithmic transformation model (Std denotes the Standard deviation).