SOME ROBUST ESTIMATION METHODS AND THEIR APPLICATIONS

This study examines robust regression methods which are used for the solution of problems caused by the situations in which the assumptions of LSM technique, which is commonly used for the prediction of linear regression models, cannot be used. Robust estimators are not influenced by small deviations and discrepancies. For this purpose, some robust regression techniques which are used in situations in which the assumptions cannot be made were introduced and parameter estimation algorithms of these techniques were analyzed. Regression models of the methods of Lad, Weighted –M regression, Theil regression and Least Median Squares, coefficients of determination and average absolute deviations were calculated and the results were discussed as to which of these methods gave better results.


INTRODUCTION
Nowadays, with statistical analysis becoming more and more important, LSM method still continues to be one of the most used methods among regression parameters estimation techniques.However, when a data set has an outlier, using LSM method by excluding these outliers from the data or including them as they are may give wrong results.In that case, using regression methods which will decrease the effect of outliers will yield more reliable results.Studies on robust estimators started when

Alphanumeric Journal
The Journal of Operations Research, Statistics, Econometrics and Management Information Systems ISSN 2148-2225 httt://www.alphanumericjournal.com/ the Least Absolute Deviation (LAD, L1) regression technique was put forward by Roger Joseph Boscovich in 1757.However, it was not used much since it was too long and complicated to calculate (Birkes D. and Dodge,Y. 1993).Later, with the developments in computer programming, studies on robust regression started again.Tukey in 1960 and Huber in 1964 studied regression and Huber who studied theoretically between the years 1972 and 1973 was followed by Hampel with his studies between 1973and 1978(Neter, J., Kutner, M.H., Nachtsheim, 1993).In a simple linear model, Theil (1950) proposed the median of pairwise slopes as an estimator of the slope parameter.Later, Sen (1968) extended this estimator to handle ties.The Theil-Sen estimator (TSE) is robust with a high breakdown point 29.3%, has a bounded influence function, and possesses a high asymptotic efficiency.Thus it is very competitive to other slope estimators (e.g., the least squares estimators), see (Sen, 1968, Dietz,1989and Wilcox,1998).The TSE has been acknowledged in several popular textbooks on nonparametric and robust statistics, e.g., (Sprent,1993), (Rousseeuw and Leroy 1986).

Estimation of regression parameters with the help of Least Absolute Deviations Method (Lad, L1)
LSM method is calculated in a way that  ^0 and  ^1estimators minimize the total of error squares (Genceli, 2001).Least Absolute Deviations Method is a method that minimizes the total of absolute errors and it is stated as follows: There is no mathematical expression to calculate estimators with Least Absolute Deviations Method.Thus, an algorithm has been developed to calculate L1 estimators.The basis of the algorithm aims to find the best line among all the lines that pass from a given ( 0 ,  0 line.The following steps are followed in finding out the regression line for L2 technique (Yorulmaz, 2003): 1. Generally, the first of observation pairs is chosen.2. By using the observation pair chosen, slope values for each observation pair and the corresponding   −  0 values are obtained.3. The absolute values of   −  0 values which correspond to slope values ordered from the smallest to the biggest are found.4. The cumulative sum of the   −  0 values found is calculated. 5. Half of the cumulative sum found in the previous step equals the critical value.
6. To find the slope value which equals the critical value, the observation value in the third step is referred to.The first observation value higher than the critical value is the point looked for.The slope value of the corresponding value is checked.This value is the value found in the third step.7. The original order of the point which gives this slope value is calculated.This point is the new starting point for the next step.8.When two consequent same values are found as a result of such iterations, the process is stopped.

Estimation of Regression Parameters through
Weighted M-Regression Technique.
In Huber M-Regression Technique, (), which is the function of error terms, is minimized.Thus, when the () function is defined for error terms in the technique proposed by Huber (1973), the following is found; (Jabr, 2005).Here,  = 1,5 *  and calculated as Here Med (.) shows the median value.
In Huber's M-Regression Technique, parameter estimations can also be calculated by using Huber weight function.The expression ∑

𝑛 𝑖=1
→ is minimized by LSM.When   weights are also taken into consideration, the minimum function will be as  ∑   (  −  0 −  =1  1   ) 2 .Some important weights are given as summarized in Table 1.The  value in the functions given in Table 1 is calculated as  =    .sgn(.) in the Hampel weighted method is the sign function and it is expressed as

Alphanumeric Journal
Regression shows the sine value.
The following steps are followed in finding out the regression line for Weighted M-Regression techniques: 1.  ^0 and  ^1 estimation values are found through LSM method.2. Next, MSM and   values are found by using these estimation values.3. Weight values are calculated.4.  ^00 and  ^10 estimation values are found through weighted LSM method.5.The process is finished if the difference between estimations is < 0,001 (Ergül, B., 2006).

Estimation of Regression Parameters through Theil-Sen Method.
Theil-Sen method is also expressed as Theill-Kendall or Theil method in literature.Brown-Mood method which is recommended for finding the slope is a fast, but not very reliable method.Thus, Theil method, which is especially recommended to find the slope coefficient, is more useful.In this method, the linear regression model is expressed as follows: Here,  0 is the cut parameter, while  1 is the slope parameter and these parameters are estimated.There are some assumptions to estimate these parameters of the simple linear regression.These assumptions are: 1.For each   value, a lower mass of 's and 's are mutually independent.2.   's are non-repetitive and they are in  1 <  2 < ⋯ <   line.

Estimation of Regression Parameters through Least Median of Squares Method.
Least Median of Squares regression is a robust method used to find out outliers.It was put forward by Rousseeuw and developed by Rousseeuw and Leroy.The method has the idea of minimizing median of error squares instead of sum of error squares.The function to be minimized is given as follows: Alphanumeric Journal The Journal of Operations Research, Statistics, Econometrics and Management Information Systems ISSN 2148-2225 httt://www.alphanumericjournal.com/ (Rousseeuw and Leroy, 1987).This estimator is robust for outliers in the direction of both  and .Breakdown point is 0.5 and it has the highest possible breakdown point (Rousseeuw and Leroy, 1987).
The following steps are followed in finding out the regression line for Least Median of Squares method: 1.  ^0 and  ^1estimation values are calculated for all point pairs.2. For each calculated  ^0 and  ^1 value, error terms with  number of observation pairs are found and the median is found by squaring these error terms.

𝛽 ^0 and 𝛽 ^1estimation values which correspond to
the least median of squares value within the calculated median of squares are taken.4. Weighted LSM technique is applied by using the weighted values in the fifth step.For the method, the weights are obtained with the following expression: and and the coefficient of determination is found as; Here, (  ) = {|  −   |} (Rousseeuw and Leroy, 1987).

REAL DATA EXAMPLE
In this practice, rainfall between the years 1970 and 1975 and annual sugar production yields are discussed.The response variable (Y) was taken as yield, while the independent variable was taken as rainfall (X) ( Clarke and Cooke, 1992).Assumptions should be proved to be able to apply the LSM method.We can check the Q-Q graph of error terms in order to be able to check visually whether normal distribution assumption is proved.When Figure 1 is analyzed, it can obviously be seen that although Q-Q graph is one of the test methods for goodness of fit, results can be misleading in such small size samples.In samples of such sizes, both visual and other goodness of fit test can give misleading results.For example, although the data seems to have normal distribution, using robust methods rather than LSM method will give more reliable results.
Parameter estimation results for the simple linear regression model L1 technique given with Model (1) are as summarized in Table 2.
For the Lad Technique, iterations were continued until the same slope value was found.Finally, as a result of the 3rd and 4th iteration, the slopes were found as equal and the process stopped after 4 iteration. ̂1 =  The weight values in Table 3 were found by using the Huber-M weighted technique in Table 1.Later, the best estimation value was found as a result of technique results and first and final iteration analysis results were summarized as in Table 4.The weight values in Table 5 were calculated by using the weight function of Hampel -M weight regression technique in Table 1 and the results of the information obtained as a result of 16 iterations were summarized in Table 6.
The results of the information obtained as a result of 12 iterations were summarized in Table 8.The regression equation estimated as a result of the 12 iterations for Andrews weighted regression is  ^ = 50,210 + 0.715  and according to this technique, the amount of rainfall explains 10,2% of the variance of yield.The weight values in Table 9 were calculated by using the weight function Tukey weighted regression technique in Table 1 and the results obtained as a result of the 7 iterations were summarized in Table 10.Thus, the regression model estimated as a result of the 7 iterations for Tukey weighted regression method is  ^ = 39.099+ 1,420  and according to this technique, the amount of rainfall explains 80,6% of the variance of yield.,  = 0 <  and  ^0 =  0 −  ^1 0 for all possible situations.  2 value was calculated for all possible data pairs.In the next step,  ^0 and  ^1 estimation coefficients with   2 value were calculated.In the light of this information, (( 6 2 ) = 15 )  ^0 and  ^1 were calculated for all possible situations in Table 11.Later, the median of the error squares of these regression parameters were found as in Table 12 and estimation values which had   2 value were expressed as regression coefficients for LMS.Regression coefficients in weighted LSM technique for LMS method were calculated by using regression coefficients obtained by LMS technique and according to this method, the amount of rainfall explains 80,1% of the variance of yield.
When Table 11 is examined for Theil method, the median of all possible slopes were taken to reach  ^1estimation and it was calculated as 1.It is calculated as  ^0 = () −  ^1() = 62.5 − 1 * 21 = 41.5

CONCLUSION AND RECOMMENDATIONS
In this study, regression line, standard error, coefficients of determination and average absolute deviations were calculated and interpreted for regression models and parameter estimations of techniques used on real life data by using simple linear robust regression techniques.According to the results, the method which gave the best result in terms of the percentage of independent variable explaining the dependent variable was Tukey-weighted regression method.Although weighted least median of squares method was close to Tukey-weighted regression method, its  2 was found to be a bit lower.The percentage of explanations obtained by non-weighted least median of squares method was calculated as  2 = 0,712.However,

Figure. 1
Figure.1 Q-Q graph of the error terms found in the practice

(
671712 .In other words, according to Lad technique, rainfall accounts for 67,1% of the variance of yield.
As a result, the regression line of LMS was obtained as  ^ = 30,778 + 1,778  .Coefficient of determinacy was calculated as  2 and according to this method, the amount of rainfall explains 71,1% of the variance of yield.

Table 1 .
Some weight functions for the estimation of simple liner regression model.

Table 2 .
Analysis results for L1 technique Results of the first iteration

Table 6 .
Hampel-M weight regression results.Thus, the regression equation estimated as a result of the 16 iterations for Hampel -M weight regression technique is  ^ = 45,875 + 0.989  and according to this technique, the amount of rainfall as a result of the final iteration explains 23,7% of the variance of yield.

Table 12 .
Median results of error squares in LMS regression analysis.

Table 13 .
Weighted LSM technique for LMS method.

Table 14 .
Weighted LSM technique for LMS method.