Robustness of subset response surface designs to missing observations

https://doi.org/10.1016/j.jspi.2009.06.011Get rights and content

Abstract

Experiments designed to investigate the effect of several factors on a process have wide application in modern industrial and scientific research. Response surface designs allow the researcher to model the effects of the input variables on the response of the process. Missing observations can make the results of a response surface experiment quite misleading, especially in the case of one-off experiments or high cost experiments. Designs robust to missing observations can attract the user since they are comparatively more reliable. Subset designs are studied for their robustness to missing observations in different experimental regions. The robustness of subset designs is also improved for multiple levels by using the minimax loss criterion.

Introduction

Response surface methodology (RSM) is an efficient tool of modern statistics, introduced by Box and Wilson (1951), which is used to study the relationship between one or more responses and a number of quantitative treatment factors. Although most published applications of RSM have been in the chemical and food industries, interest in RSM has spread to the biological, biomedical and biopharmaceutical fields. More recently, Ghadge and Raheman (2006) used RSM for process optimization for biodiesel production, Moberge et al. (2005) used RSM in a tandem mass spectrometry application, Huang et al. (2006) used RSM in a microbiological application to determine the optimum level of four protective agents and Kim and Akoh (2005) used RSM in modeling lipase catalyzed acidolysis in hexane.

There is a wide range of RSM applications but a relative paucity of books on the subject, the first of which was Myers (1971) and now Myers et al. (2009). After defining the basics of RSM they discussed some applications of response surface designs. Box and Draper (2007) discussed in detail the analysis and choice of response surface design by using a variety of supplementary examples of the data used in many published papers. Khuri and Cornell (1996) presented response surface methodology for successful experimentation, Khuri (2003) studied contemporary modeling approaches and design issues in RSM, Myers et al. (2004) reviewed the literature and more recently Khuri (2006) surveyed response surface methodology and related topics in detail. Anderson-Cook et al. (2009) revisited RSM, with an emphasis on prediction.

Usually experimenters prefer small response surface designs to save time, money and other resources. These designs are suitable for situations in which the experimenter is willing to lose information on some effects, because some effects may not be estimated when a small design is used i.e. small designs suffer considerably in efficiency for estimation of linear and linear×linear interaction coefficients. There are many real world situations in which we require replications of some runs to obtain reliable estimates of the model effects—for example, as mentioned by Gilmour (2006), industrial and laboratory based experiments on biotechnological processes, as well as some of the applications noted above. By using a larger design, in a particular setup, we can avoid the harmful effects of aliasing on the estimation of main effects and we can also spare some degrees of freedom for the testing of lack-of-fit.

A wide class of three-level response surface designs called subset designs has been introduced by Gilmour (2006). These designs are useful in biotechnological processes and other applications, in which run-to-run variation is relatively high. Subset designs have many useful properties. They are easy to construct and allow the second-order model to be fitted efficiently as well as allowing testing of lack of fit and sparing enough degrees of freedom for the estimation of pure error.

Subset designs are constructed by using subsets of the 3k factorial design. Let Sr, r=1,,k, be the subset of points taken from the regular 3k factorial design lying on the hypersphere of radius r from the point at the center of the design, S0. In this way, Sr contains all points with r factors at the ±1 levels and k-r factors at the 0 level. A subset design can be denoted as ν0S0+ν1S1++νkSk where νr is the number of replications of subset Sr. In order to fit a second-order model, a subset design must meet the following requirements:

  • νr>0 for at least two r and for at least one r with 1rk-1, to enable all the quadratic effects to be estimated; and

  • νr>0 for at least one r>1 to enable all interaction effects to be estimated.

These designs allow all linear main effects and interactions to be estimated orthogonally, but the quadratic effects are correlated with each other and with the intercept. Thus the designs are particularly useful for model selection, especially when the correlation between quadratic effects can be minimized. Subset designs are factorwise balanced for the second-order model, i.e. they have the same variance for all linear effects, the same variance for all quadratic effects and the same variance for all interaction effects. Gilmour (2006) discussed many types of subset designs and their projections to lower dimensional spaces. He suggested that the number of replications of different subsets can be chosen to meet certain optimality criteria such as D- or weighted-A-efficiency.

Some experimental situations might benefit from the construction of a subset design for multiple levels of each factor, for example in a controlled laboratory experiment where we can easily change and carefully control the levels of factors. More levels can be added by shifting the experiment to a spherical region or by using some other criterion. For a spherical region a subset design can be constructed by using the level αr=k/r for r=1,2,,k, in Sr instead of 1. However, for a spherical region only those subset designs are considered in which at least one center run is used so that the second-order model can be fitted (Gilmour, 2006).

The structure of subset designs is illustrated in Table 1. Here, four subsets are presented which can be used in a variety of combinations to construct subset designs for three factors. S3 is the set of 23 full factorial points, S2 can be obtained by combining the two-level factorial design with an unreduced balanced incomplete block design (BIBD) as in the Box–Behnken design (BBD), S1 is the set of axial points and S0 consists of a single center run.

In this article minimax loss subset designs are also considered which increase the number of levels of each factor by choosing the values of αr(r=1,2,,k-1) so as to minimize the maximum loss from missing a design point. Minimax loss subset designs are discussed later in this paper. The subset designs considered here are used to fit a second-order polynomial model in which p=(k+2)(k+1)/2 terms are to be estimated, with expectation given byμ(x)=β0+i=1kβixi+i=1kβiixi2+i=1k-1j=i+1kβijxixj.In matrix notation μ(X)=Xβ, an n×1 vector of expected responses depending on the levels of the factor settings in the design matrix X, X is an n×p matrix with mth row being the 1×p vector xmT=[1,xm1,,xmk,xm12,,xmk2,xm1xm2,,xm(k-1)xmk],where m=1,2,,n, β is a p×1 vector of unknown parameters which is to be estimated by least squares as β^=(XTX)-1XTy and the covariance matrix of β^ is V(β^)=σ2(XTX)-1. Robust design techniques attract the experimenter due to having many desirable characteristics. In some biotechnological processes the unit cost per run is quite high and repeating an experiment is difficult or impossible. Here there is a great need for constructing subset designs to make them more robust to missing observations. The problem of missing observations and the robustness of response surface designs to missing observations are discussed in the next section. Robustness of subset designs to missing observations is discussed in Section 3 for cuboidal and spherical regions. The construction of minimax loss subset designs and the theoretical development of the minimax loss criterion is discussed in Section 4.

Section snippets

Missing observations

We may be confronted with a situation in which some observations are lost or unavailable due to some accident or cost constraints and their absence has a very bad impact on the estimates of the regression coefficients. In experiments which can be implemented sequentially, the loss of one or two missing observations may not have such a bad effect on the estimation, because the experimenter gets a chance to replace the lost experimental run. For example, Turner et al. (2004) used sequential

Designs robust to missing observations

Robustness is a characteristic which provides protection to a design against departures from the usual assumptions of a good design, so that it performs well under a variety of underlying conditions. A design may be considered as robust if it is constructed under a particular criterion and performs well in some other respects. To minimize the effects of missing observations we require designs which are robust to missing observations. Box and Draper (2007) included robustness to missing values

Choosing αr to minimize the maximum loss

We now rebuild subset designs in such a way that the loss from missing an observation from the design becomes minimum, hence making the design more robust to missing data. This goal is achieved by using the minimax loss criterion. In order to make the calculations simpler, we rewrite the model in Eq. (1) asμ(x)=i=1kβixi+β0+i=1kβiixi2+i=1k-1j=i+1kβijxixj.Now let δr represent the number of points of type Sr being used in the design. Then δr=2rkrνr,r=0,1,,k,where νr is the total number of

Robustness of prediction variance to missing observations

For the model in Eq. (1), the prediction variance of the expected response is given by Var[μ^xm]=σ2xmT(XTX)-1xm,where xm is the function of the location in the design space at which an experimenter wants to predict, defined by the model as in the first section. The prediction variance is an important aspect to study in response surface designs. The ratio of prediction variances for the design with a missing observation to the prediction variance for the full design can be quite useful to check

Discussion

When using response surface methodology, an experimenter has to choose a good design to meet his requirements and the available resources. Minimax loss subset designs are optimal for minimizing the maximum loss for missing observations but they are not necessarily optimal according to other criteria. They can be useful in particular situations, especially where the experimenter has some liberty about deciding upon the choice of distance of design points from the design center, for example in

References (20)

  • D.F. Andrews et al.

    The robustness and optimality of response surface designs

    Journal of Statistical Planning and Inference

    (1979)
  • S.V. Ghadge et al.

    Process optimization for biodiesel production from mahua (Madhuca indica) oil using response surface methodology

    Bioresource Technology

    (2006)
  • M. Akhtar et al.

    Response surface designs robust to missing observations

    Communications in Statistics—Simulation and Computation

    (1986)
  • C.M. Anderson-Cook et al.

    Response surface design evaluation and comparison (with discussion)

    Journal of Statistical Planning and Inference

    (2009)
  • G.E.P. Box et al.

    Response Surfaces, Mixtures, and Ridge Analyses

    (2007)
  • G.E.P. Box et al.

    On the experimental attainment of optimum conditions (with discussion)

    Journal of the Royal Statistical Society, Series B

    (1951)
  • D. Djoudi et al.

    Optimization of copper cementation process by iron using central composite design experiments

    Chemical Engineering Journal

    (2007)
  • S.G. Gilmour

    Response surface designs for experiments in bioprocessing

    Biometrics

    (2006)
  • Herzberg, A.M., Andrews, D.F., 1976. Some considerations in the optimal design of experiments in non-optimal...
  • L.Z. Huang et al.

    Optimization of a protective medium for enhancing the viability of freeze-dried Lectobacillus delbrueckii subs. Bulgaricus based on response surface methodology

    Journal of Industrial Microbiology and Biotechnology

    (2006)
There are more references available in the full text version of this article.

Cited by (34)

  • Robustness of orthogonal-array based composite designs to missing data

    2018, Journal of Statistical Planning and Inference
    Citation Excerpt :

    The purpose was to construct some efficient designs that can estimate a second-order model by using fewer runs as compared with other second-order designs. Ahmad and Gilmour (2010) studied the robustness of subset response surface designs to missing observations. Later, by the minimax loss criterion, Ahmad et al. (2012) constructed augmented pairs minimax loss designs, which are more robust to one missing observation than the original augmented pairs designs.

  • Analysis of trace microcystins in vegetables using matrix solid-phase dispersion followed by high performance liquid chromatography triple-quadrupole mass spectrometry detection

    2017, Talanta
    Citation Excerpt :

    CCD delivers high quality predictions in studying linear, quadratic and interaction effect factors which influence a system, while interactions are unobserved in the normal orthogonal design and single factor tests. This design can still be estimated to performs well in case of loss one or two observations, and parameters of the assumed model without much loss of efficiency [40]. The PBD was preliminary applied for the detection of significant factors on method efficiency, as interaction effects are assumed to be negligible and only main effects are estimated.

  • Robustness of classical and optimal designs to missing observations

    2017, Computational Statistics and Data Analysis
    Citation Excerpt :

    Herzberg and Andrews (1976) considered the probability that a design will not estimate the desired model, and Andrews and Herzberg (1979) suggested maximizing the expected value of the determinant of the information matrix under possible missing observations. Akhtar and Prescott (1986) developed a criterion that minimizes the maximum loss due to missing observations and applied it to the evaluation and generation of central composite designs, and Ahmad and Gilmour (2010) used this measure to study the robustness of so-called subset designs (Gilmour, 2006). Herzberg et al. (1987) proposed equi-information designs, which retain equal information when up to two design points are missing.

  • Factorial and response surface designs robust to missing observations

    2017, Computational Statistics and Data Analysis
  • Ultra-high performance liquid chromatographic determination of levofloxacin in human plasma and prostate tissue with use of experimental design optimization procedures

    2016, Journal of Chromatography B: Analytical Technologies in the Biomedical and Life Sciences
    Citation Excerpt :

    Moreover, CCD belongs to so called subset designs, which are considered as designs robust to missing observations. CCD might be applied in situations, in which some observations are missing due to some accidents or cost constraints, as this design performs well in case of loss of one or two observations, and parameters of the assumed model can still be estimated without much loss of efficiency [20]. One of the numerous applications of DoE is an assessment of method robustness.

View all citing articles on Scopus
View full text