Graphical Evaluation of the Ridge-Type Robust Regression Estimators in Mixture Experiments

In mixture experiments, estimation of the parameters is generally based on ordinary least squares (OLS). However, in the presence of multicollinearity and outliers, OLS can result in very poor estimates. In this case, effects due to the combined outlier-multicollinearity problem can be reduced to certain extent by using alternative approaches. One of these approaches is to use biased-robust regression techniques for the estimation of parameters. In this paper, we evaluate various ridge-type robust estimators in the cases where there are multicollinearity and outliers during the analysis of mixture experiments. Also, for selection of biasing parameter, we use fraction of design space plots for evaluating the effect of the ridge-type robust estimators with respect to the scaled mean squared error of prediction. The suggested graphical approach is illustrated on Hald cement data set.


Introduction
Mixture experiments, a special study area of response surface methodology (RSM), are performed in many areas of product development and improvement. The purpose of mixture experiments is to model the blending surface with some forms of the mathematical equation. In general, for modeling well-behaved mixture system, the specific models such as Scheffé canonical polynomials are used in developing the appropriate response model. The parameter estimates of the Scheffé canonical polynomials are obtained by the ordinary least squares (OLS) method. However, in mixture experiments, multicollinearity is more frequent than the other experimental studies because of the constraints of components composing the mixture. Therefore, OLS estimates of the Scheffé canonical polynomials may be statistically insignificant with wrong sign and their variances are large so they may be far from the true values due to constraints on the mixture components [1][2][3]. To reduce the effect of multicollinearity in the analysis of mixture experiments, variable transformation or ordinary ridge regression (ORR), suggested by Hoerl and Kennard [4], is widely used as a classical method.
In mixture experiments, the detailed application of ORR estimator is given by St. John [1]. To evaluate the performance of ORR estimator, the graphical approaches apart from the analytical methods have been proposed in mixture experiments. These graphical approaches are based on plots of the scaled prediction variance (SPV) trace, which is suggested by Vining et al. [5]. The plot of SPV traces is used to compare the quality of prediction for designs in mixture experiments [5,6]. Using the plot of SPV traces, Jang and Yoon [7] suggested a graphical method for evaluating and selecting an appropriate biasing parameter that balances the bias-variance tradeoff. With this approach, the SPV values are plotted along prediction rays that correspond to Cox directions. On the other hand, Jang and Anderson-Cook [8] used fraction of design space (FDS) plots, which is developed by Zahran et al. [9], and violin plots to illustrate and evaluate the effect of ORR estimator with respect to the SPV values and to guide the decision about the value of biasing parameter .
In mixture experiments, apart from the problems due to multicollinearity, another important problem which needs attention in parameter estimation is the presence of outliers in the data set [3,10,11]. In this case, outliers affect the efficiency of OLS estimates [12]. To overcome this problem, robust 2 The Scientific World Journal estimation methods like -estimator, least median of squares (LMS) estimator, least trimmed sum of squares (LTS) estimator, -estimator, and MM-estimator are proposed (see [12][13][14][15][16]). However, although robust estimators are the resistant techniques used to obtain the parameter estimates which are not affected by outliers, they are frequently unstable when the design matrix is ill-conditioned [12]. Therefore, effects due to both outliers and multicollinearity in the analysis of mixture experiments can be reduced to a certain extent by using alternative approaches. One of these approaches is to use biased-robust estimators for the estimation of the coefficients. In literature, it is recommended to use alternative ridgetype robust estimators for both outliers and multicollinearity problem in data set (see [17][18][19][20][21][22]).
The purpose of this study is to apply and evaluate various ridge-type robust estimators in the analysis of mixture experiments. Also, the performance of ridge-type robust estimators is dependent on the biasing parameter. In this case, to determine the biasing parameter apart from the analytical methods, alternative plots which are used in comparison and evaluation of the designs and which are based on the SPV values or the scaled mean squared error of prediction (SMSEP) values can be used. Considering the SPV values in evaluating and comparing designs is important, but bias properties of designs should also be considered [23]. Therefore, our study will focus on the use of FDS plots as a graphical tool for evaluating the effect of various ridge-type robust estimators with respect to the SMSEP values and for guiding the decision about the value of the biasing parameter.
This paper is organized as follows. In the next section, the concise information about the analysis of mixture experiments with robust methods is given. In Section 3, the ridge-type robust estimators based on the -estimator, GMestimator, and MM-estimator are introduced. In Section 4, the FDS plots are introduced for the selection of biasing parameter . The suggested approach is illustrated on Hald cement data in Section 5. Finally, the obtained results are given in Section 6.

Mixture Experiments
In mixture experiments, the measured response depends only on the proportions of the components present in the mixture and not on the amount of the mixture. For example, the measured response might be the strength of concrete which is a mixture of cement, sand, and water, or it might be octane rating of a blend of gasoline. The aim of these experiments is to find optimal component proportions that provide desired values of some product performance characteristics [24].
The mixture constraints (1) produce a regular ( − 1)dimensional simplex experimental region. However, the experimental region (1) is sometimes subjected to additional constraints on individual mixture components of the form where and denote lower and upper bounds, respectively. In general, restriction (2) reduces the experimental region given by (1) to an irregular ( − 1)-dimensional hyperpolyhedron. Mixture experimental designs are discussed by Cornell [24] for regular and irregular simplex regions. In general, the presence of natural constraints or/and some additional linear constraints between the components requires the use of specific polynomial models such as Scheffé canonical polynomials for the modeling of the mixture system [25]. For modeling well-behaved mixture system, generally the Scheffé canonical polynomials are sufficient [24]. The first-and second-degree Scheffé canonical polynomial are given as As usual, we can represent the Scheffé canonical polynomial models in matrix form by where y is × 1 vector of observations on the response variable, X is × (≥ ) matrix, where is number of the terms in the model, is × 1 vector of parameters to be estimated, and is × 1 vector of random errors satisfying ( ) = 0 and ( ) = 2 I. The vector of unknown parameters can be generally estimated using OLS method. The OLS estimator for iŝO LS = (X X) −1 X y.
When the observations y in model (4) are normally distributed, the OLS is a good parameter estimation procedure in the sense that it produces an estimator of the parameter vector that has good statistical properties [12]. For example, OLS is the best linear unbiased estimator of and also the maximum-likelihood estimator of . On the other hand, there are many situations where the distribution of the response variable is nonnormal. Under conditions of nonnormal distributions, particularly heavy-tailed distributions, OLS estimator no longer has these desirable properties. These heavy-tailed distributions tend to generate outliers, and these outliers may have improper effect on the OLS estimates. To deal with this, several robust-to-outliers methods have been proposed in the literature [12,16].
Robust regression methods are well-designed statistical techniques to produce regression coefficients which are not sensitive to outliers and nonnormal error distributions [13]. The most popular of all robust estimation techniques is estimation, proposed by Huber [26]. The -estimator is defined aŝ= where denote the residuals as = (̂) = − x̂, where x denote the th row of X, and is a robust estimate of scale. The scale is required to gain scale equivariance and either can be an external scale estimate or can be estimated simultaneously. The function is related to the likelihood function for an appropriate choice of the error distribution. Different choices for lead to different variants of estimators. As with most -estimation procedures, both the Huber weight function and the Tukey biweight function are two common choices [16].
To minimize (6), equate the first partial derivatives of with respect to each parameter which leads to equations of the form where (⋅) is the partial derivative of with respect to parameters. In general, the function in (7), a nonlinear function, must be solved by iterative methods. The most common solution is iteratively reweighted least squares (IRLS), resulting in an estimator where W is an × diagonal matrix of weights = ( / )/( / ), = 1, 2, . . . , . IRLS estimation procedure updates W matrix until a convergence criterion has been met [12].
On the other hand, -estimators are not robust to high leverage points. As a remedy to this problem, boundedinfluence generalized -estimators (GM-estimators) have been proposed. The most commonly used GM-estimator is the Schweppe-type estimator. Handschin et al. [27] proposed an objective function with associated normal equations of the form where, for appropriate values of = (x ), the GMestimator can downweight outliers with high leverage points. Several suggestions for have been made that involve typical OLS outlier diagnostics, including studentized residuals or DFFITS [28]. The resulting estimator is (8), except the weights which become = ( / )/( / ). In the literature, several GM estimation approaches are suggested using various combinations of GM components (objective function, initial estimate, scale estimate, function, etc.) [29,30].
The breakdown point and efficiency are two practical concerns that should be taken into account when selecting a robust estimation procedure [12,16]. According to these properties, GM-estimators possess the same efficiency and asymptotic distributional properties as -estimators. But their breakdown points may not be much better than the worst case 1/ . In these cases, both -estimation and GMestimation can be improved by starting with a good initial estimate. In general, an alternative desirable initial estimator with a high breakdown point can be -estimator [29,30].
-estimator proposed by Rousseeuw and Yohai [14] is based on minimizing the dispersion of the residuals. The objective function for -estimation iŝ where the dispersion function̂is found implicitly as the solution to where is a tuning constant defined as = Φ [ ( )] and Φ represent the standard normal distribution. The solutionî s called an -estimator of scale. But it is impossible forestimator to achieve a high breakdown point as well as a high efficiency [16].
Other most commonly used robust estimators, proposed by Yohai [15], are MM-estimators, which combine high breakdown with high asymptotic efficiency. Estimates are a modification of the typical -estimates and are obtained through a three-step process [16]. Let̂be an -estimator, and let̂be the corresponding -estimator of scale, solving (11) for =̂. The MM-estimator is then defined as local solution of̂M obtained with IRLS and initial valuê. The initialestimator guarantees a high breakdown point and the final MM-estimate a high Gaussian efficiency. In both the initialestimator and the final MM-estimator, a standard choice of is Tukey biweight function. The tuning constant can be set to 1.547 for the -estimator to guarantee a 50% breakdown point, and it can be set to 4.685 for the second step MMestimator in (12) to guarantee a 95% efficiency of the final estimator [16].

Ridge-Type Robust Estimators
Multicollinearity and outliers are two main problems for researchers using regression estimation methods. In the presence of multicollinearity, several alternative estimation techniques are proposed. One of them is the ORR estimator which was proposed by Hoerl and Kennard [4]. The ORR estimator which is the most popular method for dealing with multicollinearity is defined bŷ 4 The Scientific World Journal where is called biasing parameter. SincêO RR depends on̂O LS ,̂O RR is sensitive to outliers. Therefore, outliers may have also improperly effect on the ridge estimates. The problem is further complicated when both outliers and multicollinearity are present in the data [31]. In recent years, major efforts have been made to obtain reliable estimates especially in the presence of outliers and also multicollinearity. When both outliers and multicollinearity occur in a data set, it would seem beneficial to combine methods designed to deal with these problems individually. It is hoped that the resulting ridge-type robust estimators will be resistant to outliers and less affected by multicollinearity [19,21,31].
A ridge-type -estimator, suggested by Silvapulle [19], is defined bŷ wherêis the -estimators of . Silvapulle [19]  On the other hand, Maronna [22] stated that Z matrix and̂estimator in (14) are also sensitive to leverage points. Therefore, a ridge-type robust estimator which is not sensitive to both -and -outliers is given bŷ wherêis one of the robust estimators (such as GMestimator or MM-estimator) of . In this case, the use of ridge-type robust estimators is a good method for obtaining more stable parameter estimates for the model. The ridge-type robust estimator given in (15) depends on the biasing parameter . For selection of biasing parameter, Silvapulle [19] proposed the two different types of adaptive choices based on Hoerl et al. [32] (abbreviated to HKB) and Lawless and Wang [33] (abbreviated to LW). The same results can be extended tôR R estimators. The resulting two choices of arêH is the th eigenvalue of X WX. Also for ORR estimator, there are several methods for selecting the biasing parameter in literature. Most of these methods have been reviewed by Clark and Troskie [34]. Recommended methods can be similarly extended to the ridge-type robust estimators.

The Graphical Method for Evaluating Biasing Parameter
Due to the fact that there is not a definite rule for selection of biasing parameter, the graphical approaches developed for the ORR estimator can be used to select the biasing parameter of the ridge-type robust estimators. Different than analytical methods, determination of the biasing parameter with graphical approach is useful since it helps the researcher to examine the prediction properties of design based on the different biasing parameters visually. The prediction properties of a design, specifically the SMSEP values, also depend on the fitted model. Therefore multicollinearity due to the fitted model also influences the performance of the prediction properties of design completely. In this case, to examine the graphical approaches developed for comparison of the performance of designs based on the different biasing parameter for a single design is a useful method [7,8].
For using the suggested methods on the analysis of mixture experiments, firstly we define the predicted response at x . When we usêR R estimator, the predicted response value at x is given aŝ The mean squared error of̂(x ), also known as the mean squared error of prediction (MSEP), is given by where

Bias[̂(x )] = [̂(x )] − (x ) is the bias associated with estimatinĝ(x ).
The variance of̂(x ), also known as the prediction variance, is given by where var[̂] is the variance-covariance matrix of̂. There are various approximations for determination of covariance matrix of the robust estimators in the literature [13]. Huber [13] has shown that a reasonable approximation for the covariance matrix of̂is var[̂] ≈ 2̂2 (X X) −1 , where
Using (17), the bias of̂(x ) at x is approximately given by By combining the result in (19) and (20), an approximate expression for the MSE of̂(x ) is given by Note that MSE[̂(x )] is a function of the biasing parameter , the location of mixture x in the design space, the model matrix, and the weights matrix. It should also be noted that  (21), the unknown parameters can be replaced by suitable estimates of the parameters.
When design comparison focuses on prediction, the MSEP is generally scaled by / 2 to create a scale-free quantity that reflects the design size. The scaled mean squared error of prediction (SMSEP) at the th observation for a ridge-type robust estimator is written as where denotes the design size. The multiplication by serves to adjust the MSE for the total number of observations. This practice, which is quite common in RSM, makes it possible to use the SMSEP as a design criterion for comparing several designs [6].
In To evaluate the distribution of SMSE[̂(x )] values on the constrained experimental design, the FDS plots which are used in comparison and evaluation of the designs can be used. The main advantage of the use of FDS plots is to display characteristics of SMSE[̂(x )] values throughout a multidimensional design region on a two-dimensional graph. Therefore, we can use the FDS plots as a graphical tool for evaluating the effect of ridge-type robust estimators with respect to the SMSEP values and for selecting an appropriate value of the biasing parameter . In this case, instead of just considering a single FDS curve for a design, we examine the impact of the biasing parameter on the MSEP performance throughout the experimental region.
To calculate the FDS, two sampling methods with nearly same results are generally explored to sample mixture points x from the design space [35]. The first uses shrunken regions by Piepel et al. [36] and randomly selects a number of points on each shrinkage level proportional to that level. The second method selects points completely at random that fit within the constraints of the region [35]. In this study, we will use second method for generation of the sample points on the experimental region. Once the points are generated,

Example: The Hald Cement Data
To evaluate the performances of ridge-type robust estimators, we will draw on a mixture experiment given by Piepel and Redgate [37]. Piepel and Redgate [37] pointed out that the data from a cement example published by Hald [38] can be analyzed in the framework of the mixture experiments. The purpose of the experiment is to investigate the influence of the components on the heat of hardening (calories/gram), measured after the cements cured for 180 days. The data given in Table 1 contains the values (weight fractions, wf) computed from the 13 oxide weight percent clinker compositions. Additional information on the analyzing of the Hald cement experiment can be found in Piepel and Redgate [37] and Smith [3].
When we use first-degree Scheffé canonical polynomial, the obtained model is as follows: where the quantities in parentheses are estimated as standard errors of the coefficient estimates. Statistics for detecting influential observations for the Hald cement data set are given in Table 2.
Based on the result of Table 2, 8 ordinary residual seems suspiciously large. Also, observation 3 has a high leverage value. The parameter estimates of the robust estimators are given in Table 3. In Table 3, the quantities in parentheses are estimated as standard errors of the coefficient estimates and AICR is the robust version of Akaike Information Criterion which is used for model selection [16]. Table 4 displays the final weights for robust estimators. In -estimation, there is considerably downweight only at point 3. In case of GM-estimation, both point 3 and point 8 are significantly downweight. On the other hand, all observations are moderately downweight in MM-estimation. But, we note that neither point 3 nor point 8 is significantly downweight in MM-estimation, despite the fact that these points have the largest Cook's distance and ordinary residual, respectively.
In addition, the condition number of X X matrix is 101.034. So, the design matrix X is moderately ill-conditioned. Therefore, due to the multicollinearity and outliers, the ridgetype GM-estimator or the ridge-type MM-estimator can be used. For the ridge-type GM-estimator, we can calculate the biasing parameter aŝH KB ≈̂L W = 0.00001. For the ridgetype MM-estimator, the biasing parameter can be estimated aŝH KB = 0.00007 and̂L W = 0.00002 based on the methods proposed by Silvapulle [19].
To evaluate the biasing parameter for the ridge-type GM-estimator and the ridge-type MM-estimator, randomly generated mixture points on the experimental region are considered. For different values, the FDS plots of SMSE[̂(x )] values based on̂G M and̂M M estimators are given in Figures  1 and 2, respectively.
In Figures 1 and 2, as the biasing parameter increases, the distribution of SMSE[̂(x )] values initially decreases gradually and then increases dramatically. In this case, we 6 The Scientific World Journal   Table 5.
In Tables 3 and 5, we observe that MSE[̂= 0.00001 (x )] < MSE[̂(x )] is achieved for the ridge-type GM-estimator. Also, when we select the biasing parameter as = 0.0001, the same result is obtained for the ridge-type MM-estimator. A more comprehensive comparison of the ridge-type GMestimator and the ridge-type MM-estimator under MSEP sense is given in Figure 3.
From . As a result, which ridge-type robustestimator is the best seems to be dependent on the robust estimator of and the choice of biasing parameter . We have to replace these unknown parameters by some suitable estimates in practice. Therefore, the results of the graphical findings may change.

Conclusions
In this study, we used various ridge-type robust estimators to reduce the effect of the multicollinearity and outliers on the model parameters to certain extent in the analysis of  mixture experiments. We observed that the performance of the ridge-type robust estimator depends on the robust estimator of and the biasing parameter . In this study, as robust estimator of , the -estimator, GM-estimator, and MM-estimator were discussed. By selecting the robust estimators which are less sensitive to points outlying in the -and/or -direction will improve the performance of the ridge-type robust-estimator. On the other hand, depending on the biasing parameter, to evaluate the performance of the ridge-type robust estimators, we adapted extensive  approaches used in literature. Therefore, for evaluation of biasing parameter, the FDS plots which are based on the SMSEP values can also be used for both the mixture models affected by multicollinearity and outliers and other biasedrobust estimators.