A robust moving average iterative weighting method to analyze the effect of outliers on the response surface design

Article history: Received 25 October 2010 Received in revised form April, 01, 2011 Accepted 20 April 2011 Available online 1 May 2011 The paper discusses about the effect of outliers and trends on the response surface design fitted to the experiments results. The common way to analyze the response surface is to fit the polynomial regression to the response variable by ordinary least square method and to find the significant controllable variables by ANOVA. In this case, the outliers can have confusing effect on the regression model, which derives the experiment results and lead to wrong interpretation of the data. The proposed moving average iterative method (MAIW) of this paper is a robust approach to decrease the effect of these faulty points by considering the previous data to detect the outliers or detect the probable trends in residuals. The iterative weighting method is used to estimate the coefficients of the regression model and a numerical example illustrates the proposed approach. © 2011 Growing Science Ltd. All rights reserved


Introduction
There are many cases especially in experimental results where there some wrong data treated as outliers.These points, which may occur for different reasons such as reading faults by operator reading it would be confusing to interpret the results.A common method of explaining and analyzing the results of experiments is to use response surface design.This term is used for a regression equation that shows behavior of the control variables and one or more responses.We can use the estimated function to predict the response according to the controllable factors.Once we have performed an experimental design and experiments, we need to do a statistical analysis to select the appropriate values for the input variables in an attempt to optimize the output.This can be done by fitting a regression model between the controllable factors and the response variables.
Future interpretations are based on this regression model, so the exact model is very important and may affect the optimization stage.This model is generally constructed by the ordinary least squares (OLS) method.OLS is very sensitive to outliers, and they may have an inordinate effect on the ultimate conclusion.The detection and accommodation of outliers has been studied for many years.The wrong model can be resulted to outliers, but we know that the cause of outlier is not as important as their effect and we follow the methods that can modify their effects.The trend in residuals is another problem in the response surface design.This issue is associated with model adequacy checking.In general, it is always necessary to examine the fitted model to ensure that it provides an adequate approximation to the true system and verify that none of the least square regression assumptions is violated.Any way better coefficient estimation contains no specific trend in residual analysis.There are some primary tools such as half-normal probability plot to perform outlier detection, which can be used in presence of single outlier and can be categorized as observable tools.The other methods focus on significance tests to detect the outliers.Marrona et al. (2006) emphasized that the OLS method is very sensitive to outliers, so some alternative methods such as least-absolutedeviations or robust partial least squares or generalized linear models are used instead of OLS to decrease outliers' effects but robust approaches simplify the task of outlier identification by weighting the large residuals.Wisnowskia et al. (2001) studied analysis of multiple outlier detection procedures for linear regression model.The robust and efficient response surface is a goal of many studies.Hejazi et al. (2010) proposed a novel approach based on goal programming, to find the best combination of factors, to optimize multi response multi covariate surfaces with by considering location and dispersion effects.Kazemzadeh et al. (2008) proposed a method to optimize multi response surfaces based on goal programming method.Bashiri et al. (2009) studied multiple simulation response surfaces for robust optimization in inventory system.Huber (1981) proposed Mestimators methods to obtain robust regression.Morgenthaler et al.(1999) discussed robust response surface in chemistry based on design of experiment.Hund et al. (2002) presented methods of outlier detection and evaluate robustness tests with different experimental designs.Bickela (2006) compared robust estimators with their applications.The M and GM estimators presented iterative procedures.Therefore, several authors (e.g.Cummins & Andrews, 1995) renamed these estimators as iteratively reweighted least squares (IRLS).Ortiz et al. (2006) discussed some of the robust methods used for robust regression in analytical chemistry.Another useful robust method is least median squared proposed by Rousseeuw (1984) and the other useful method is least trimmed squared proposed by Rousseeuw and Leroy (1987).
The Fast-LTS was discovered by Rousseeuw and van Driessen (2006) and it is probably the best one in practice.The method can be viewed as a combination of a gradient method and a genetic algorithm.Nguyena (2010) studied outlier detection and proposed new least trimmed squares approximate.Recently a "partial" version of the M-estimator based on the "fair" ψ function and an appropriate weighting scheme was proposed by Serneels et al. (2005).The authors claim that the partial robust M-regression outperforms existing methods for robust partial least square regression.In order to obtain a robust method with higher efficiency, Siegel (1982) proposed the repeated median estimator.Massart et al. (1986) explained the advantages of its use in chemical analysis.Bertsimas and Shioda (2007) presented mixed integer programming (MIP) models for the classification and robust regression problems.Zioutas and Avramidis (2005) presented the effect of deleting outliers in regression model obtained by MIP, and its performance compared with LS and LMS.Another new method in robust regression is the mixed linear model surveyed by Dornheim and Brazouskas (2011).Pop and Sârbu (1996) proposed a new fuzzy regression algorithm to obtain robust model.Marrona et al. (2006) proposed many M-estimators using in robust regression methods in both single response and multiple responses.Wiens (2010) surveyed a comparative study of robust designs for M-estimated regression models.This study tries to find better regression function with adequate residual analysis result based on moving average and it is compared to M-estimator method result.For better illustration of proposed method, the literature review is classified in Table 1.This paper is organized as follows.In section 2 detailed analysis based on the least-square fit is discussed.In section 3, modifying the response surface by iterative weighting procedure is presented.The moving average method is presented in section 4. The robust method based on moving average and iterative weighting method as proposed approach is presented in section 5.The numerical example illustrates the proposed approach in section 6 and section 7 summarizes the contribution of this paper.

Ordinary-least-squares
The least squares method is used to estimate the regression coefficients in a multiple linear regression model where there are n observations, is the response and the variable denotes the ith observation of variable .The error term represents the error in the model, and it is supposed that the error (residual) has normal distribution with 0 and and that the are uncorrelated random variables.The model equation can be written as follows: ∑ 1,2,3, … , .
(1) The aim is to choose the coefficients to minimize the sum of the squares of the errors .The least squares function is The function L is minimized by considering the coefficients and the least square estimators, must satisfy There are k+1equations, and solving them simultaneously determines the values of all the coefficients.If the hypotheses on ε are satisfied, the LS estimate has the minimum variance in the class of all unbiased linear estimates.If, in addition, the error ε is normally distributed, then this estimator has minimum variance among all unbiased estimators.Linearity is a significant restriction; many maximum likelihood estimators (e.g., under logistic and all t-distributions of the errors) are not linear.
The rejection of outliers is also a non-linear operation.In fact, the LS estimator is optimal in the class of all unbiased estimators only if the errors are normally distributed.Therefore, the restriction to linear estimators can be justified only by normality.

Iterative weighting and modifying
To compensate for the effects of the outlier values, we can either remove the outlier data or modify them by weighting the residuals.The first approach is not rational, so we choose to modify them in order to decrease the effect of outliers in the coefficient estimation stage.The proposed idea is as follows: ,…, .
In this equation, is a function defined by unknown coefficients ( ).For example, if and are constants, the response can be obtained from the experimental results and the regression model describes the relationship between the variables and the expected values of the .If all the measurements are good, then the OLS method provides a reasonable model and the coefficients are estimated by minimizing the following equation.
However, if the results appear abnormal, which may be a consequence of residual behavior in the experiments; the coefficients are determined by minimizing the following equation.The abnormality occurs when a residual behaves like an outlier.
The weights are not pre-assigned values because the quality of each is not known in advance.The reasonable values for the weights are based on the residuals defined by the following equation: The weights should be inversely proportional to the value of the residuals, | | .In other words, the residuals with large values are weighted less, and this method produces better coefficient estimates.These weights can be chosen by a function such as the Huber weight function: where c is a constant.The procedure is as follows: compute the first coefficients of the regression model, compute the residuals and weights, and then compute the new coefficients by the equation.This procedure can be repeated because the values of the coefficients and the values of the residuals and weights are different, so this procedure can be repeated until a good solution is obtained.This procedure is known as iterative weighting OLS.The procedure terminates when the change in the estimation from one iteration to the next is sufficiently small.However, there is a probability that some residuals become less than the specified limit they are ignored in coefficient estimation stage.Therefore, a new approach is to identify the flexible limits is proposed in the next section.

Moving average method
Supposed that we have some individual process observations , then the moving average with observation at t is proposed as the following equation: (8) Actually at t , the oldest observation is omitted and the newest observation is added to the observations.The variation of this statistic is calculated as follows: In this method, the statistic for each new observation is calculated and these statistics can be compared with the upper control limit (ULC) and lower control limit (LCL) and if they violate the limits, we can say that the statistics are not in control.If we want to recognize the small shift, we can increase the size of .

Robust fitting response surface using moving average iterative weighting method (MAIW)
In this section, the method is proposed to modify the outliers and the trends in the residuals, which do not violate the accuracy of the model and finally estimate a robust regression model for the experiments.First, as mentioned in section 2, the primary estimation is obtained by OLS method and then the residuals are calculated.As mentioned in section 3 the weights are proportion to residual values, they are computed by Eq. ( 7), and the procedure continues until the coefficients changes are negligible.Since we consider a constant bound for outlier detection, residual trends cannot be detected.It seems that by considering the variation of previous residuals and their trends, better regression coefficients estimation and consequently reliable analysis can be performed.The flexible residual bound, instead of constant c in Eq. ( 7) can be considered as more robust alternative, Eq. ( 10) illustrates it clearly.We can compute the variation of the residuals in experiment and by selecting the proper the residual bound can easily be computed.If the residuals are greater than the values of the computed bounds at specific run order, the weight is computed using the previous formulation and iterative process continues.By these flexible bounds, we can find and modify the small shift faster.In our problem, we suppose that the residuals have normal distribution with mean of zero and standard deviation and the residuals are computed for each observation by Eq. ( 6) and ( 8).Then the weights are calculated as follows:

Fig.1. Flowchart of the proposed MAIW
Then after computing the weights, by minimizing the Eq. ( 5) associated with the weights and the procedure continues.The method will be explained in the next section by numerical example and the flowchart of the MAIW method is presented in Fig. 1.

Numerical example
Consider an experiment design, which contains one response variable and four explanatory control variables.Each variable is at three levels and the primary objective of the study is to optimize the yield of a product.Table 2 shows the input data for our experiment.We want to explore the yield response surface by using a second-order regression model.A Box-Behnken design with 27 treatments is used for this experiment, the blocking is used to decrease the effect of nuisance factors, and the blocks are assigned for example to three days.In order to determine the most important items we perform ANOVA test and the results are summarized in Table 3. Fig. 2 shows residuals of primary fitted model for different runs.

Fig. 2. The least squares residuals of the model for different runs
As we can observe from Fig. 2, the points 4,7 and 16 indicate there are some outliers and we repeat the regression to determine the outliers and the summarize them in Table 4.
In our study the residuals show trend behavior in some periods of runs and to obtain more reliable and more robust regression model, the moving average iterative weighting (MAIW) method is applied and the results are compared with Huber, P.J. (1981) method as a common robust regression fitting method.Fig. 3 shows that by Huber (c=3), the points outsides the green lines are modified by weighting, but as it is shown in Fig. 4 the proposed method contains flexible residual bounds.
Moreover, it is obvious that the proposed approach in the first iteration can identify residual trends in the last experimental runs.However, the previous approaches such as Huber method cannot detect residual trends (as can be seen in Fig. 3 and Fig. 4)  The iterative weighting method applied in this numerical example has been coded in Matlab 7.8 and the results of Huber method are presented too and the variation of residuals in these methods are compared and coefficients and residuals results have been reported in Table 5 and 6.As it is shown in Table 5, the MAIW ( =2), in this case can provide better estimation among others.Moreover it is clear that the variation between residuals for the proposed model is less than the OLS method and also Huber(c=3).The Fig. 5, 6, 7, 8 show the residuals plot of Huber (c=3), MAIW ( =9), MAIW ( =4) and MAIW ( =4) respectively.
]]] If we decrease the value of in this model, the variation between residuals after coefficients estimation in the last model will be reduced.By this method consequently the Mean square error of the model is decreased as well.Fig. 9 illustrates the results clearly.

Conclusion
We have presented a new robust estimation for response surface modeling.The main advantage of the proposed model of this paper is to detect outliers using some moving average technique.The proposed moving average iterative weighting (MAIW) method of this paper is based on moving average residual bounds for iterative weighting method in which the flexible bounds for residuals is supposed to consider both little trends and outliers.We have examined the performance of the proposed model of this paper for some benchmark problems from the literature.The preliminary results indicated that the proposed model of this paper outperform the previously existed methods of

MSE
the literature.This paper can extended for coefficients estimation for non-equal residual variances and we leave it as a future study.

Fig. 9 .
Fig. 9. Comparison of MSE values of proposed method and other regression coefficient estimators

Table 1
A review of the robust regression model OWDE: Outlier weighting during estimation, RMR: Robust Multiple response, IM: Iterative method, RRB: Flexible (Moving Average) residual bound, CRB: Constant residual bound, OLS: Ordinary least square, DM: different M-estimators, MO: Multiple outliers, SO: Single outlier,

Table 4
The coefficients of OLS method in the case of omitting outliers

Table 6
The residuals of the OLS, Huber (c=3) and MAIW (with three different )