Robustness of sequential third-order response surface design to missing observations

Response surface designs are generally used in process/product optimization studies. Sequential third-order response surface designs are advantageous when the experimenter encounters the significance of lack of fit of a fitted second-order model while establishing the relationship between the input and response variables. In some experimental situations, responses pertaining to certain design points may be destroyed or unobtainable or inaccessible. The unavailability of the observations pertaining to certain experimental run(s) affects the design property and affects the analysis of variance. In this paper, we have examined the robustness of sequential third-order rotatable design and investigated the loss of information when one or two observations pertaining to experimental run(s) is (are) missing, which are at different radii from the design centre. It has been found that the maximum loss of information occurs when the observation at the design points which are at higher radii from the design centre is lost and the design has the minimum efficiency.


Introduction
Response surface designs (RSDs) are widely used for Response Surface Methodology (RSM) based optimization studies, which aid in exploring the relationship between a group of explanatory variables and one or more response variable(s). The nature of the relationship that exists between them is usually unknown and is approximated with a polynomial of any order (first, second, third, and so on). The N designs points that allow fitting the first-, second-, or third-order response model and measuring the adequacy are called first-, second-, or third-order response surface design. The most commonly used second-order response surface designs are 3 v (v is the number of factors) factorial design, Box-Behnken design (BBD) and Central Composite Design (CCD). In certain situations, the fitted secondorder response surface model may tend to be inadequate (i.e. lack of fit is significant even though fitted model is significant). In such circumstances, the inference drawn based on the fitted model would be misleading. Therefore, one would be interested to explore the input-response relationship through a higher order model. Thus, fitting a third-order model after experimenting with a new third-order response surface design is needed, this would be expensive. For such situations, executing sequential third-order response surface design by experimenting with additional runs in continuation with second-order response surface design without discarding the responses received with the initial design (first stage design) would be desirable, and it is cost-effective and less resource use compared to non-sequential design [1,2].
The situation may arise when some observations pertaining to some design points are lost or destroyed during experimentation or unobtainable or inaccessible due to less resource or unable to experiment with the specific factor level combination or avoiding some points due to its high cost. The unavailability of the observations affects the design property and also affects the analysis of variance. Most of the previous studies (listed in section 1.1) were focused on studying the effect of missing observation(s) in secondorder rotatable design. Therefore, we are concerned about the robustness of sequential third-order rotatable design and investigated the loss of information when one or two responses are missing in the existing sequential third-order rotatable design.

Robustness of response surface designs (RSDs)
by Box and Hunter [3], which states that the experimenter should make appropriate assumptions regarding the nature of the relationship between the inputs and the response; the correctness of the model under the assumption; the model capability for a good estimation of model parameters and prediction of new observations; testing the lack of fit; homogeneity of the error variance; planning the experiment in such way that resists the presence of outliers and robust to error in the design level; flexibility of the model for increasing order; the design used allows the experiments to be performed in blocks and cost-effectiveness of the design used for experimentation. The above properties reveal that designing an experiment is not necessarily easy and should involve balancing multiple objectives rather than just focusing on a single characteristic. For details on RSDs and associated properties, one may refer to the brief literature review on response surface design by Hemavathi et al. [4] which covers broadly seventy years of research in the area of RSDs.
RSM based optimization trials may sometimes witness the presence of outliers, missing observations, the presence of systematic trend in the blocks, etc.; therefore, the designs that are insensitive or robust to such kind of disturbance(s) are needed and are called robust response surface design. In some experimental situations, it is quite often that some observations are lost or become unavailable. The missing rows of X can have adverse effects on the design properties. A design that guards against such effects is called robust to missing observations. The problem of missing values in RSDs was first investigated by Draper [5] followed by McKee and Kshirsagar [6]. Box and Draper [7] pointed out the connection between the diagonal values of the hat matrix and a design's robustness for assessing the robustness of a design. Herzberg and Andrews [8] introduced two measures viz., probability of breakdown, which considers the probability of an observation that is missing at a design point or defines the probability that design will not estimate all the unknown parameters of the model. A smaller value indicates a more robust design. The second measure is the expected precision of design estimates using the variance of predicted observation. Based on the Fisher information matrix, Andrews and Herzberg [9] suggested an efficiency measure that aids in discriminating the design when the D-optimality criterion and BD [7] outlier criterion fails to distinguish.
On using the two criteria, Akhtar and Prescott [10] and MacEachern et al. [11] examined the loss of information for CCD, BBD [12], Split plot central composite design [13] with respect to missing a single observation and on missing three observation [14]. Some attempts were made to study the robustness of response surface design against the violation of assumption on the error distribution [15,16] and outlier issues [17]. A measure based on per cent loss of information has been developed and further obtained robust CCD and BBD with marginal percentage loss in information with high D-efficiencies to one missing observation [18]. Ahmad and Gilmour [19] studied the robustness of subset response surface designs to one missing observation by computing the ratio of prediction variances of the design with a missing observation to the prediction variance of the full design.
Akhtar and Prescott [10] developed a measure, minimax loss criteria considering the relationship between the reduced and complete design matrix and obtained augmented pairs minimax loss designs that are more robust to one missing observation than augmented pair design [20]. Based on minimax loss criteria, Alrweili et al. [21] constructed a minimax loss response surface design that is robust to missing observation. Rashid et al. [22,23] examined the robustness of Augmented Box-Behnken Designs (ABBD) and Augmented Fractional Box-Behnken Designs (AFBBD) for various values of alpha using minimax loss criterion and found that precision of model parameter estimates and relative A, D and G-efficiencies are less affected by missing a centre run than other design points.
The probability of estimability [24] helps in estimating the maximum number of observations that can be missed and it allows the estimation of the second-order response surface model. The criteria give better assessment over t max criteria [25] and loss of D-efficiency [26]. Missing a single observation affects the estimation and predictive capability of CCD, and the most significant loss occurs in missing factorial points [27], whereas screening designs are less affected by missing observation except in the case of smaller runs [28]. To prevent the optimal design from being too sensitive to missing observations, Da Silva et al. [29] incorporated a measure of leverage uniformity in the compound design criterion.
Apart from the numerical measures, design robustness can be assessed by graphical method viz., extended scaled prediction variance (ESPV), and extended spherical average prediction variance (ESAPV) [30].

Robustness measures
In general, if m observations are missing from an experiment, then corresponding m rows in the matrix X will be missing, which is represented by X m , where m represents the number of missing observations (m = 1, 2, 3 . . . .). The information matrix obtained from the available data after m observations are lost be denoted as (X r X r ) which is a reduced matrix and (X X) is the complete information matrix obtained from the complete experimental data with no missing observations. In case of missing observation, the matrix X is partitioned as where X m and X r are m × p and r × p matrix of m rows of missing observations and r rows of remaining available data, respectively. The relationship exists between the reduced and the complete information matrix be or In Equation (2) Post multiply with (X X) −1 on both sides, we get relation as and now post multiply (X X) in Equation (3), then the relation becomes The determinants of the reduced information matrix is Akhtar and Prescott [10] described this relationship as where E m = |1 p − (X m X m )(X X) −1 | is the diagonal element of the (1-R) corresponding to m missing design points. Akhtar and Prescott [10] measured loss of efficiency by considering the relationship between the determinant of the reduced and the complete matrix. The loss that occurs due to missing of any set of m observation is relative to complete observation as a measure L m , E m is the efficiency of the reduced design with m missing design points over complete design, and p is the number of parameters. The values L m lie between 0 and 1. If L m = 0 then Equation (7) becomes This implies that the determinant of the reduced information is singular.
The loss that occurs due to missing any one observation relative to complete observation is denoted by L 1 and is as follows: where |X X| is the determinant of a complete design, |X 1 X 1 | is the determinant when one response is lost (reduced design), i.e. determinant from available data after one observation is missing, and E 1 is the efficiency of the reduced design over complete design.
The point at which maximum loss occurs is Maximum loss (max L 1 ) = maximum (L f1 , L a1 , L c1 , L b1 , L cb1 ,L fb1 ,L fbc1 ) where, L f1 = loss of information due to missing of one observation from factorial points, F

Sequential Third-Order Rotatable Designs (STORDs)
Higher order RSDs are classified as either sequential or non-sequential designs based on the way in which it is being executed during the experimentation. Sequential response surface designs are executed step by step, and they are cost-effective, whereas, in non-sequential response surface designs the entire runs are executed in one stretch, which is expensive and time-consuming. When the lack of fit of the second-order model becomes significant, the responses generated from the trials will not be useful for optimization, which may lead to a huge loss in terms of resources and time.
Sequential STORDs are constructed by choosing N design points, which constitutes sets of equally spaced points with different radii of order ≥ d and having the same prediction variance at all design points that are equidistant from the design centre [1]. STORDs can be obtained either as (i) sequential in runs (from second-order design to third-order design in a sequential manner) (GHC design -Gardiner et al. [1]; T 1 to T 9 design -Draper [31][32][33][34]; Das and Narasimham [35]; Herzberg [36]; Augmented Box-Behnken designs (ABBDs) -Arshad et al. [37]; Augmented fractional Box-Behnken designs (AFBBDs) -Rashid et al. [38]; Cornelious and Cruyf [39]) or as (ii) sequential in factors (from lower dimension to higher dimension) (Huda [40,41]; Koske et al. [42,43]). The practical implication of sequential response surface design in the situation of inadequate fit of second-order model is illustrated in section 3.2.

Application potential of STORDs for optimization trials: an illustration
This section explains the application potential of STORDs (sequential in runs) in fishery experimentation for optimizing the degree of enzymatic hydrolysis of fish protein through hypothetical data set. The enzymatic hydrolysis of fish protein has been employed as an alternative approach for valorizing underutilized fish biomass. To optimize the degree of hydrolysis, RSM was employed. The objective of the study was to find out the optimum condition of hydrolysis pH (represented as factor A), roughly in the range of 7.0 and 9.0, hydrolysis temperature (represented as factor B), roughly in the range of 40-60°C, enzyme-substrate ratio (E/S) (represented as factor C), roughly in the range of 2.0-4.0 that maximize the degree of hydrolysis (DH) %.
Stage 1: The experiment was planned using a secondorder Box-Behnken Design (BBD) for three factors, each at three levels, with a total of 16 experimental units. The simulated experimental response, along with actual and coded values, is listed in Table 1. The analysis was carried out using a well-known software package for RSM, Design-Expert 13.0 (Licensed to ICAR-Central Marine Fisheries Research Institute). Initially, a second-order model was fitted to the data, and fitted model (Equation 9) in terms of actual factors is as follows: Besides, the second-order model-based ANOVA using the 16 runs is also given in Table 2.
It is evident from Table 2 that the lack of fit of the model is coming out to be significant even though the second-order model gives a significant fit (p < 0.05). This indicates that the fitted model was not appropriate for optimization. The estimation of third-order parameters is needed to be explored further. This is possible when a new set of runs (need to have experimented in stage 2) can augment with the existing runs so that the total runs would satisfy the property of third-order rotatability, and also ensures that experimentation is cost-effective.
The third-order model is fitted for the response taken at both stages and is given in Equation (10) and the ANOVA in Table 4   The non-significance of lack of fit (Table 4) and also the increased Adjusted R 2 from 0.67 [Equation (9)] to 0.99 [Equation (10)] and the low BIC and AIC indicates third-order model provides a better functional relationship between the response and input variables compared to second-order model and hence it is better for further optimization.
In subsequent sections, we have examined the robustness of different classes of STORDs.

Maximum information loss and efficiency of sequential third-order response surface designs against missing observation
In this section, we examined the robustness of sequential third-order rotatable design and obtained the loss of information when one response at each set of points is missing with the help of the measure given in Equation (8) [10]. In the investigation, we have also considered the case of missing two observations for smaller designs and worked out the maximum losses and design efficiency.
The third-order designs constitute various combinations of sets of points (as given in Table 5). The effect of missing one observation at different radii shows a    varying impact on information loss and efficiency of the design ( Table 5). The robustness of sequential thirdorder design and sequential asymmetric third-order rotatable designs (SATORDs) of CLASS I and CLASS II for v = 3 [2] are examined ( Table 6). The information loss and efficiency of the design are measured by considering one missing observation from different sets of points. The maximum loss of information given in Tables 5 and 6 depicts the loss of information increases when a response corresponds to design points that are at higher distance from the design centre.

Robustness of sequential third-order design of smaller runs
Any researcher is usually interested in a design that can be implemented with minimum cost and less resource use and thus preferably chooses the design with smaller  ). To study robustness, we have selected the G-efficient designs (Arshad el. [37] and Rashid et al. [38]) for factors v = 3-7, since the design with more than seven factors is of less interest. We have examined the robustness of the design with respect to missing one and two observations in ABBDs and AFBBDs with a specific value of the parameters (denoted by d and e) and are given in Tables 7  and 8. For v = 3, ABBD has 12 points of B [3]; ±1, ±1, 0 , 8 points of F[d] 3 ; ±d, ±d, ±d , 6 points of A[e] 3 ; ±e, 0, 0 and four centre points C[0] v ; 0, 0, 0 with parameter value d = 1.01, e = 1.7494 (vide Table  3). When one response at the points B [3] is lost then design losses 65% of information (L b1 = 0.65) and has 35% efficiency (E b1 = 0.35). Similarly when one observation is lost at factorial points then design has only 19% efficiency (E f1 = 0.19) whereas 81% information gets lost (L f1 = 0.81) and missing one observation at axial points lead to the design with 83% (L a1 = 0.83) loss of information and has 17% (E a1 = 0.17) efficiency while there is less loss in information (L c1 = 0.20) and has more efficiency (E c1 = 80%) when missing an observation at centre point. Among all design points, maximum (L f1 , L a1 , L c1 , L b1 ) = 0.83, the maximum loss of information happens when an observation is lost at axial points.
For v = 4, 5, the loss of information is maximum when one observation is missed at BBD followed by factorial, axial points and centre points with L b1 = 0. Robustness of AFBBDs in terms of maximum loss due to missing one observation (L 1 ) and the efficiency (E 1 ) at all possible combinations are measured and given in Table 4. For v = 3, when one response at the points 1 2 BB [3] is lost, then the reduced design yielded complete loss of information (L b1 = 1, E b1 = 0). Similarly, for v = 4, 5 and 7 the loss of one response at  7 , the design also exhibit more information loss (L a1 ) of 0.90, 0.92, 0.92, 0.93, 0.89 and has less efficiency (E a1 ) of 0.10, 0.08, 0.08, 0.07, 0.11 respectively for 3 ≤ v ≤ 7. The loss of information (L c1 = 0.25) and efficiency (E c1 = 0.75) when missing one response pertaining to centre point is same for v = 3-6 and L c1 = 0.10, E c1 = 0.90 for v = 7.
We have also considered the case of missing two responses either at F[d] v , A[e] v , B[v], C[0] v, or at all possible pairs of the sets. Missing any two observations in ABBDs results in more than 70% (L 2 = 0.70) loss in information except for centre points. For v = 3 and v = 4, the respective loss of information (L 2 ) and efficiency (E 2 ) of the design when any possible combination of two observations are missing is given in Table 9. The design has minimum loss of information only when two observations are lost at centre points rather than missing two observations from any other sets of design points. Estimation of all the parameter is not possible when two observations at axial points are lost vide. Table 9.

Remark 3.1:
In sequential third-order rotatable design, maximum loss in information occurs when the observation pertaining to design points having higher radii away from the design centre is getting lost and the designs have minimum efficiency.
In the illustration given in section 3.2, missing of any one run from 1 to 12 (Table 1) or any one run from 17 to 24 (Table 3) or any one run from 25 -30 (Table 3) would ensure the estimation of the model parameters of thirdorder response surface model and that has less influence in the efficiency in comparison to a design without missing observation. It is also similar in missing any run from the centre points. For missing any two runs from 1 to 12 (Table 1), or any two runs from runs 17 to 24 (Table  3) or any two runs from 25 to 30 (Table 3), the fitting of a full third-order response surface model would be impossible as some of the cubic terms get aliased with other model parameters. It can be seen that missing the responses corresponding to the runs 6 and 7 leads to non-estimability of C 3 , and also missing responses corresponding to the runs 25 and 26 leads to the nonestimability of A 3 term as these terms get aliased with other model parameters when the responses pertaining to the respective runs are missing during the process of experimentation.

Summary and conclusions
The effect of missing observation(s) for sequential thirdorder response surface designs is studied, and the amount of information lost when missing one or two observations in the smaller design is also examined. The findings of the design robustness against missing observation(s) are listed • The maximum loss in information occurs when the observation(s) at the design points having higher radii from the design centre is lost.
• The effect of missing a centre point is less severe than the effect of missing a run either at BBD or factorial or axial or complement of BBD or fraction of BBD. • In ABBDs, the information loss is severe when missing one observation at axial point for v = 3 and 6 and missing one observation at BBD for v = 4, 5. For v = 7, the loss is severe on missing one observation at the axial point as well as a complement of BBD. • The missing of any two observations in ABBDs resulted in more than 70% loss in information except at centre points. • In AFBBDs, for v = 3, 4, 5, and 7, missing one observation at fractional BBD 1 2 BB[v] might not retain any information as the reduced design results in nonestimability (|X 1 X 1 | = 0)since the effect of missing observation is more severe for the smaller design. While for v = 6, some information retains at missing one observation of fractional BBD since the combination constitutes a fractional complement of original BBD.
Missing one observation either at BBD or factorial or axial or complement of BBD creeps the estimability problem and; thus, reduced design is not suitable for the optimization studies. The sequential third-order response surface design in smaller runs (ABBDs or AFBBDs) will be cost-effective. But when one observation is lost in such smaller designs, then unlike missing a centre points, there will be more information loss for missing response at other design points, particularly there will be complete loss of information when missing response pertaining to a run at fractional BBD. Therefore, the experimenter should be cautious when missing even a single observation in smaller designs which may result in complete loss of information and leads to non-estimability.