The Robust Regression Methods for Estimating of Finite Population Mean Based on SRSWOR in Case of Outliers

: The ordinary least square (OLS) method is commonly used in regression analysis. But in the presence of outlier in the data, its results are unreliable. Hence, the robust regression methods have been suggested for a long time as alternatives to the OLS to solve the outliers problem. In the present study, new ratio type estimators of finite population mean are suggested using simple random sampling without replacement (SRSWOR) utilizing the supplementary information in Bowley’s coefficient of skewness with quartiles. For these proposed estimators, we have used the OLS, Huber-M, Mallows GM-estimate, Schweppe GM-estimate, and SIS GM-estimate methods for estimating the population parameters. Theoretically, the mean square error (MSE) equations of various estimators are obtained and compared with the OLS competitor. Simulations for skewed distributions as the Gamma distribution support the results, and an application of real data set containing outliers is considered for illustration.


Introduction
The OLS scheme is widely used in estimating the parameter of a linear regression model, which has a wide range of applications in real-life provided that the OLS assumptions are satisfied. In many cases, these assumptions may be violated due to the nature of the data under consideration, especially of the occurrence of an outlier. Therefore, several robust regression methods are suggested to overcome this problem. Some of the commonly known robust regression methods are the least absolute deviations method, where the Least Absolute Deviations (LAD) regression is the first step for robust regression methods [Nadia and Mohammad (2013)]. The least median squares method is suggested and improved by Rousseeuw et al. [Rousseeuw and Leroy (1987)]. The least trimmed squares method, the Huber-M plan, is introduced by Huber [Huber (1973)]. The Hampel-M method is suggested by Hampel [Hampel (1971)], the Tukey-M method is proposed by Tukey [Tukey (1977)], and the Huber-MM method by Yohai [Yohai (1987)]. In this study, we considered the following generalized M methods. The Mallows GMestimator which was proposed by Mallows [Mallows (1975)], the Schweppes GMestimate method that was introduced by Handschin et al. [Handschin, Kohlas, Fiechter et al. (1975)], the SIS GM-estimate method which was submitted by Coakley et al. [Coakley and Hettmansperger (1993)], with illustrations given in the next section. However, the Huber-M was adopted by Subzar et al. [Subzar, Bouza, Maqbool et al. (2019a)] in the case of outliers, and was compared with the OLS method. It was shown that the Huber-M estimation performs better than the OLS method. In the current study, we have adopted the generalized case of M-estimation methods and compared it with the OLS and Huber-M estimation. Suppose that Y is a study variable, and X is an auxiliary variable that is correlated with Y. Also, let the population means of Y and X, respectively, are Y and X , with variances 2 Y σ and 2 X σ , and let the correlation coefficient between Y and X is respectively. Based on Cochran [Cochran (1977)], the mean squared error of ŷ is given by ( ) ( ) Al-Omari et al. [Al-Omari, Ibrahim and Jemain (2009)] have suggested ratio-type estimators of the population mean using SRS as  [Subzar, Maqbool, Raja et al. (2019)] introduced a new ratio estimator as an alternative to the regression estimator using auxiliary information. Moreover, Subzar et al. [Subzar, Maqbool, Raja et al. (2018)] introduced ratio estimators for the population mean in simple random sampling using supplemental information. For more details about ratio and regression estimators, see Jemain et al. [Jemain, Al-Omari and Ibrahim (2008); Krasker (1980); Krasker and Welsch (1982); Subzar, Bouza and Al-Omari (2019b); Bouza, Al-Omari, Santiago et al. (2017); Yu and Yao (2017)]. The rest of this paper is prepared in seven sections and subsections. The robust regression techniques are illustrated in Section 2, while the suggested ratio estimators are presented with their main properties in Section 3. In Section 4, efficiency comparisons of the OLS method with the robust regression techniques are presented. Numerical illustrations are provided in Section 5, and in Section 6, an application of real data is supported. Finally, the paper is concluded in Section 7.

Robust regression techniques
In this section, we summarized the main robust regression methods considered in this study.

2.1-Huber-M estimation function
The M-Estimator is a well-known estimator advocated by Huber [Huber (1973)]. The M-Estimator is given by The influence function is determined by taking the derivative of this function as where the tuning constant Q defines the center and tails.

Generalized M estimation function
The generalized M-Estimate (GM-estimate) is proposed to provide reliable results. The general GM class of estimators is defined by where ψ is the certain function, as in the case of M-estimate. Mallows [Mallows (1975)] proposed Mallows GM-estimate to M-estimate resistant to high leverage outliers. The Mallows GM-estimate is defined by

Mallows GM estimation function
where (  The weight i w ensures that the observations with high leverage receive less weight than observations with small leverage.

Schweppe GM estimation function
The Schweppe GM-estimate is suggested by Handschin et al. [Handschin, Kohlas, Fiechter et al. (1975)] to be the solution of the equation which adjusts the leverage weights according to the size of the residual i r .

SIS GM estimation function
Coakley et al. [Coakley and Hettmansperger (1993)] proposed Schweppe one step (SIS) estimate, which extended from the original Schweppe estimator. The SIS estimator is defined as where the weight i w is defined in the same way as Schweppe's GM-estimate.

Suggested estimators
In this section, the proposed ratio estimators are presented. The suggested estimators are suggested based on the supplementary information of Bowley's coefficient of skewness with quartiles. For estimating the parameters, we considered the OLS method, Huber Mestimate, Mallows GM-estimate, Schweppe's GM-estimate and SIS GM-estimate method. The proposed estimators are as follows.

Using the OLS method
where the Bowley's coefficient of skewness is defined as the ith quartile. The mean squared error expressions for the above estimators can be derived as follows. For the estimator given in Eq. (7), the mean squared error equation can be obtained as , , h X Y R = As shown in Wolter [Wolter (1985)], Eq. (10) can be applied to the proposed estimator to obtain its MSE as follows:  Squaring both sides of the last equation and taking the expectation to obtain Similarly, the MSEs of Eqs. (8)

Using the Huber M-estimation
The suggested estimators based on the Huber M-estimation are given by

Using the Mallows GM-estimate
The suggested estimators based on the Mallows GM-estimate method with their mean squared error expressions are provided here as

Using the Schweppe GM-estimate
The Schweppe GM-estimate is used to suggest the following estimators as The MSE for the Eqs. (26)

Using the SIS GM-estimate
The suggested ratio estimators using the SIS GM-estimate are given by

Efficiency comparison of the OLS method with robust regression techniques
In this section, a theoretical comparison between the OLS method with the robust regression methods is presented for the estimators considered in this study. Let ˆ( ) ( ), 1, 2,3, 3, 4,...,15 indicates the robust regression techniques (Huber M-estimate, Mallows GM-estimate, Schweppe GM-estimate, and SIS GM-estimate) used to the ratio estimators proposed in the present study. Let Consequently, we have the following conditions: If one of the conditions (38) or (39) is satisfied, the proposed estimators using the mentioned robust regression methods are more efficient than the usual ratio estimators based on the OLS method.

Numerical illustration
For numerical illustration, a real data set is selected from Division of Agricultural Statistics, Faculty of Horticulture Shalimar in which the data of apple production amount (as an interest of variate) and the number of apple trees (as an auxiliary variate) in 499 villages of District Baramulla of Jammu and Kashmir from 2010 to 2011. (Source: RCM project, pilot survey for estimation of cultivation and production of apple in District Baramulla, RCM approved project). First, we have stratified the data by area wise and from each stratum (region), and the samples (villages) have been selected randomly.
Here, we have taken the sample size to 170. We joined two areas, then chose four strata where each one contains three blocks (as 1: Zaniger, Boniyar, Tangmarg; 2: Wagoora, Sopore, Baramulla; 3: Uri, Pattan, Rohama; 4: Rafiabad, Kunzer, Singapore) for this data. However, in the present study, we have used only the data of Uri, Pattan, Rohama of district Baramulla of Jammu and Kashmir, due to the interest in simple random sampling. We have applied our proposed ratio estimators on the data of apple production amount and number of apple trees in 117 villages of Uri, Pattan, Rohama of district Baramulla of Jammu and Kashmir, in which the apple production (in tons) is denoted by Y (study variable), and the number of apple trees is denoted by X (auxiliary variable, 1 unit = 100 trees). The characteristics of the data set are given in the Tab where ˆi Y represents the estimated mean squared error for 1, 2, ,5000 i =  and Y is the population mean. Different sample sizes such as 20,30, 40,50,60 n = are considered in this study to investigate performance of the suggested estimators using Mallows GMestimate, Schweppe GM-estimate and SIS GM-estimate compared to the estimators using the OLS and Huber-M. The relative efficiency is defined by The results are summarized in Tab. 5. It turns out that while using the OLS method, the estimators do not rely on the precise results in case of outliers. Then, by adopting the above-mentioned robust regression techniques, the suggested estimators perform better, and as the sample size increasing, these estimators seem to be much better. Also the Mallows GM-estimate, Schweppe GM-estimate, and SIS GM-estimate are better than the Huber-M estimate. Moreover, as the sample size increases, these techniques give precise results in the presence of outliers.

Conclusion
The results of this study revealed that by adopting the robust methods, Mallows GMestimates, Schweppe GM-estimates, and SIS GM-estimates, the proposed estimators of the population mean perform better than their competitors based on the OLS and Huber-M methods. Hence, we strongly recommend considering the suggested estimators using Mallows GM-estimates, Schweppe GM-estimates, and SIS GM-estimate to estimate the population parameters as compared to the OLS and Huber-M estimation methods in the presence of outliers. The suggested estimators in this paper can be modified using other sampling methods as ranked set sampling and median ranked set sampling methods. See for illustration [Haq, Brown, Moltchanova et al. (2016a, 2016b; Al-Omari and Haq (2019); Haq, Brown, Moltchanova et al. (2015); Al-Nasser and Al-Omari (2018); Jemain, Al-Omari and Ibrahim (2007); Zamanzade and Al-Omari (2016)].