Robustness of subset response surface designs to missing observations
Introduction
Response surface methodology (RSM) is an efficient tool of modern statistics, introduced by Box and Wilson (1951), which is used to study the relationship between one or more responses and a number of quantitative treatment factors. Although most published applications of RSM have been in the chemical and food industries, interest in RSM has spread to the biological, biomedical and biopharmaceutical fields. More recently, Ghadge and Raheman (2006) used RSM for process optimization for biodiesel production, Moberge et al. (2005) used RSM in a tandem mass spectrometry application, Huang et al. (2006) used RSM in a microbiological application to determine the optimum level of four protective agents and Kim and Akoh (2005) used RSM in modeling lipase catalyzed acidolysis in hexane.
There is a wide range of RSM applications but a relative paucity of books on the subject, the first of which was Myers (1971) and now Myers et al. (2009). After defining the basics of RSM they discussed some applications of response surface designs. Box and Draper (2007) discussed in detail the analysis and choice of response surface design by using a variety of supplementary examples of the data used in many published papers. Khuri and Cornell (1996) presented response surface methodology for successful experimentation, Khuri (2003) studied contemporary modeling approaches and design issues in RSM, Myers et al. (2004) reviewed the literature and more recently Khuri (2006) surveyed response surface methodology and related topics in detail. Anderson-Cook et al. (2009) revisited RSM, with an emphasis on prediction.
Usually experimenters prefer small response surface designs to save time, money and other resources. These designs are suitable for situations in which the experimenter is willing to lose information on some effects, because some effects may not be estimated when a small design is used i.e. small designs suffer considerably in efficiency for estimation of linear and interaction coefficients. There are many real world situations in which we require replications of some runs to obtain reliable estimates of the model effects—for example, as mentioned by Gilmour (2006), industrial and laboratory based experiments on biotechnological processes, as well as some of the applications noted above. By using a larger design, in a particular setup, we can avoid the harmful effects of aliasing on the estimation of main effects and we can also spare some degrees of freedom for the testing of lack-of-fit.
A wide class of three-level response surface designs called subset designs has been introduced by Gilmour (2006). These designs are useful in biotechnological processes and other applications, in which run-to-run variation is relatively high. Subset designs have many useful properties. They are easy to construct and allow the second-order model to be fitted efficiently as well as allowing testing of lack of fit and sparing enough degrees of freedom for the estimation of pure error.
Subset designs are constructed by using subsets of the factorial design. Let , =, be the subset of points taken from the regular factorial design lying on the hypersphere of radius from the point at the center of the design, . In this way, contains all points with factors at the levels and factors at the level. A subset design can be denoted as where is the number of replications of subset . In order to fit a second-order model, a subset design must meet the following requirements:
- •
for at least two and for at least one with , to enable all the quadratic effects to be estimated; and
- •
for at least one to enable all interaction effects to be estimated.
Some experimental situations might benefit from the construction of a subset design for multiple levels of each factor, for example in a controlled laboratory experiment where we can easily change and carefully control the levels of factors. More levels can be added by shifting the experiment to a spherical region or by using some other criterion. For a spherical region a subset design can be constructed by using the level for , in instead of 1. However, for a spherical region only those subset designs are considered in which at least one center run is used so that the second-order model can be fitted (Gilmour, 2006).
The structure of subset designs is illustrated in Table 1. Here, four subsets are presented which can be used in a variety of combinations to construct subset designs for three factors. is the set of full factorial points, can be obtained by combining the two-level factorial design with an unreduced balanced incomplete block design (BIBD) as in the Box–Behnken design (BBD), is the set of axial points and consists of a single center run.
In this article minimax loss subset designs are also considered which increase the number of levels of each factor by choosing the values of so as to minimize the maximum loss from missing a design point. Minimax loss subset designs are discussed later in this paper. The subset designs considered here are used to fit a second-order polynomial model in which terms are to be estimated, with expectation given byIn matrix notation , an vector of expected responses depending on the levels of the factor settings in the design matrix , is an matrix with th row being the vector where , is a vector of unknown parameters which is to be estimated by least squares as and the covariance matrix of is . Robust design techniques attract the experimenter due to having many desirable characteristics. In some biotechnological processes the unit cost per run is quite high and repeating an experiment is difficult or impossible. Here there is a great need for constructing subset designs to make them more robust to missing observations. The problem of missing observations and the robustness of response surface designs to missing observations are discussed in the next section. Robustness of subset designs to missing observations is discussed in Section 3 for cuboidal and spherical regions. The construction of minimax loss subset designs and the theoretical development of the minimax loss criterion is discussed in Section 4.
Section snippets
Missing observations
We may be confronted with a situation in which some observations are lost or unavailable due to some accident or cost constraints and their absence has a very bad impact on the estimates of the regression coefficients. In experiments which can be implemented sequentially, the loss of one or two missing observations may not have such a bad effect on the estimation, because the experimenter gets a chance to replace the lost experimental run. For example, Turner et al. (2004) used sequential
Designs robust to missing observations
Robustness is a characteristic which provides protection to a design against departures from the usual assumptions of a good design, so that it performs well under a variety of underlying conditions. A design may be considered as robust if it is constructed under a particular criterion and performs well in some other respects. To minimize the effects of missing observations we require designs which are robust to missing observations. Box and Draper (2007) included robustness to missing values
Choosing to minimize the maximum loss
We now rebuild subset designs in such a way that the loss from missing an observation from the design becomes minimum, hence making the design more robust to missing data. This goal is achieved by using the minimax loss criterion. In order to make the calculations simpler, we rewrite the model in Eq. (1) asNow let represent the number of points of type being used in the design. Then where is the total number of
Robustness of prediction variance to missing observations
For the model in Eq. (1), the prediction variance of the expected response is given by where is the function of the location in the design space at which an experimenter wants to predict, defined by the model as in the first section. The prediction variance is an important aspect to study in response surface designs. The ratio of prediction variances for the design with a missing observation to the prediction variance for the full design can be quite useful to check
Discussion
When using response surface methodology, an experimenter has to choose a good design to meet his requirements and the available resources. Minimax loss subset designs are optimal for minimizing the maximum loss for missing observations but they are not necessarily optimal according to other criteria. They can be useful in particular situations, especially where the experimenter has some liberty about deciding upon the choice of distance of design points from the design center, for example in
References (20)
- et al.
The robustness and optimality of response surface designs
Journal of Statistical Planning and Inference
(1979) - et al.
Process optimization for biodiesel production from mahua (Madhuca indica) oil using response surface methodology
Bioresource Technology
(2006) - et al.
Response surface designs robust to missing observations
Communications in Statistics—Simulation and Computation
(1986) - et al.
Response surface design evaluation and comparison (with discussion)
Journal of Statistical Planning and Inference
(2009) - et al.
Response Surfaces, Mixtures, and Ridge Analyses
(2007) - et al.
On the experimental attainment of optimum conditions (with discussion)
Journal of the Royal Statistical Society, Series B
(1951) - et al.
Optimization of copper cementation process by iron using central composite design experiments
Chemical Engineering Journal
(2007) Response surface designs for experiments in bioprocessing
Biometrics
(2006)- Herzberg, A.M., Andrews, D.F., 1976. Some considerations in the optimal design of experiments in non-optimal...
- et al.
Optimization of a protective medium for enhancing the viability of freeze-dried Lectobacillus delbrueckii subs. Bulgaricus based on response surface methodology
Journal of Industrial Microbiology and Biotechnology
(2006)
Cited by (34)
Central composite designs with three missing observations
2024, Applied Numerical MathematicsRobustness of orthogonal-array based composite designs to missing data
2018, Journal of Statistical Planning and InferenceCitation Excerpt :The purpose was to construct some efficient designs that can estimate a second-order model by using fewer runs as compared with other second-order designs. Ahmad and Gilmour (2010) studied the robustness of subset response surface designs to missing observations. Later, by the minimax loss criterion, Ahmad et al. (2012) constructed augmented pairs minimax loss designs, which are more robust to one missing observation than the original augmented pairs designs.
Analysis of trace microcystins in vegetables using matrix solid-phase dispersion followed by high performance liquid chromatography triple-quadrupole mass spectrometry detection
2017, TalantaCitation Excerpt :CCD delivers high quality predictions in studying linear, quadratic and interaction effect factors which influence a system, while interactions are unobserved in the normal orthogonal design and single factor tests. This design can still be estimated to performs well in case of loss one or two observations, and parameters of the assumed model without much loss of efficiency [40]. The PBD was preliminary applied for the detection of significant factors on method efficiency, as interaction effects are assumed to be negligible and only main effects are estimated.
Robustness of classical and optimal designs to missing observations
2017, Computational Statistics and Data AnalysisCitation Excerpt :Herzberg and Andrews (1976) considered the probability that a design will not estimate the desired model, and Andrews and Herzberg (1979) suggested maximizing the expected value of the determinant of the information matrix under possible missing observations. Akhtar and Prescott (1986) developed a criterion that minimizes the maximum loss due to missing observations and applied it to the evaluation and generation of central composite designs, and Ahmad and Gilmour (2010) used this measure to study the robustness of so-called subset designs (Gilmour, 2006). Herzberg et al. (1987) proposed equi-information designs, which retain equal information when up to two design points are missing.
Factorial and response surface designs robust to missing observations
2017, Computational Statistics and Data AnalysisUltra-high performance liquid chromatographic determination of levofloxacin in human plasma and prostate tissue with use of experimental design optimization procedures
2016, Journal of Chromatography B: Analytical Technologies in the Biomedical and Life SciencesCitation Excerpt :Moreover, CCD belongs to so called subset designs, which are considered as designs robust to missing observations. CCD might be applied in situations, in which some observations are missing due to some accidents or cost constraints, as this design performs well in case of loss of one or two observations, and parameters of the assumed model can still be estimated without much loss of efficiency [20]. One of the numerous applications of DoE is an assessment of method robustness.