A robust unbiased dual to product estimator for population mean through modified maximum likelihood in simple random sampling

In simple random sampling setting, the ratio estimator is more efficient than the mean of a simple random sampling without replacement if ρyx > 1 2 Cx Cy , provided R > 0, which is usually the case. This shows that if auxiliary information is such that ρyx< 2 Cx Cy then we cannot use the ratio method of estimation to improve the sample mean as an estimator of population mean. So there is need of another type of estimator which also makes use of information on auxiliary variable x. Product method of estimation is an attempt in this direction. Product-type estimators are widely used for estimating population mean when the correlation between study and auxiliary variables is negatively high. This paper is developed to the study of the estimation of the population mean using of unbiased dual to product estimator by incorporating robust modified maximum likelihood estimators (MMLEs). Their properties have been obtained theoretically. For the support of the theoretical results, simulations studies under several super-population models have been made. We study the robustness properties of the modified estimators. We show that the utilization of MMLEs in estimating finite population mean results to robust estimates, which is very gainful when we have non-normality or common data anomalies, such as outliers. Priyanka Chhaparwal ABOUT THE AUTHORS Sanjay Kumar obtained his Ph.D from Banaras Hindu University, Varanasi, India. He has been working as an Assistant Professor since 2011 in the Department of Statistics, Central University of Rajasthan, Ajmer, Rajasthan, India. His research interests include estimation, optimization problems and robustness study in sampling theory. Priyanka Chhaparwal is currently working as a research scholar at the Department of Statistics, Central University of Rajasthan, Ajmer, Rajasthan, India. Her research area includes estimating problems in sampling theory PUBLIC INTEREST STATEMENT In sampling theory, for obtaining the estimators of parameters of interest with more precision is an important objective in any statistical estimation procedure in the field of agriculture, medicine and social sciences. For example: estimating quantity of fruits in a village. Supplementary information obtained from auxiliary variable helps in improving the efficiency of the estimators. For example: for estimating quantity of fruits in a village, size of plots can be used as supplementary information which will help in improving the estimators. Several authors have studied such problems under normality case. In this paper, we consider the case where the underlying distribution is not normal, which is a more realistic in real-life situations. We support the theoretical results with simulations under several super population models and study the robustness property of the modified estimator. Kumar & Chhaparwal, Cogent Mathematics (2016), 3: 1168070 https://doi.org/10.1080/23311835.2016.1168070 © 2019 The Author(s). This open access article is distributed under a Creative Commons Attribution (CC-BY) 4.0 license. Received: 30 September 2015 Accepted: 04 March 2016 First Published: 06 April 2016 Corresponding author: Sanjay Kumar, Department of Statistics, Central University of Rajasthan, Bandarsindri, Kishangarh, Ajmer, Rajasthan 305817, India E-mail: sanjay.kumar@curaj.ac.in Reviewing editor: Guohua Zou, Chinese Academy of Sciences, China Additional information is available at the end of the article


PUBLIC INTEREST STATEMENT
In sampling theory, for obtaining the estimators of parameters of interest with more precision is an important objective in any statistical estimation procedure in the field of agriculture, medicine and social sciences. For example: estimating quantity of fruits in a village. Supplementary information obtained from auxiliary variable helps in improving the efficiency of the estimators. For example: for estimating quantity of fruits in a village, size of plots can be used as supplementary information which will help in improving the estimators. Several authors have studied such problems under normality case. In this paper, we consider the case where the underlying distribution is not normal, which is a more realistic in real-life situations. We support the theoretical results with simulations under several super population models and study the robustness property of the modified estimator.

Introduction
The use of additional information supplied by auxiliary variables in sample survey has been considered mainly in the area of actuarial, medicine, agriculture and social science at the stage of organization, designing, collection of units and developing the estimation procedure. The use of such auxiliary information in sample surveys has been studied by Cochran (1940), who used it for estimating yields of agricultural crops in agricultural sciences. Product method of estimation is a popular estimation method in sampling theory. In case of negative correlation between study variable and auxiliary variable, Robson (1957) defined a product estimator for the estimation of population mean which was revisited by Murthy (1967). The product estimator performs better than the simple mean per unit estimator under certain conditions. The use of auxiliary information in sample surveys is widely studied in the books written by Yates (1960), Cochran (1977) and Sukhatme, Sukhatme, and Asok (1984). Further, Jhajj, Sharma, and Grover (2006), Bouza (2008Bouza ( , 2015, Swain (2013) and Chanu and Singh (2014) studied the use of auxiliary information under different sampling designs for improving several estimators.
be the population means of the study variable y and the auxiliary variable x, respectively for the population U : U 1 ; U 2 ; :::: where S y and S x are the population mean squares for the study variable y ð Þ and the auxiliary variable x ð Þ: The traditional product estimator for population mean Y proposed by Murthy (1964) is given by (1:1) where y ¼ 1 n ∑ n i¼1 y i , x ¼ 1 n ∑ n i¼1 x i and n is the size of the sample.
The bias and the mean square error (MSE) of the estimator y p is given by f ¼ n N and S yx is the covariance between the study variable and auxiliary variable.
An unbiased estimator y pu of the population mean Y after correcting the bias of y p is given by y pu ffi y p À B y p (1:4) Bandopadhyay (1980) proposed a dual to product estimator, which is given by (1:5) where the sample mean of z is z ¼ N XÀn x NÀn , the population mean of z is Z ¼ X, and x is negatively correlated and y is positively correlated with transformed variable z.
The bias and the MSE of the estimator t 1 are given by where ρ yx ð< 0Þ is the correlation between y and x, k ¼ The estimator t 1 is preferred to y p when, k > À 1 2 1 þ γ ð Þ, 1À γ ð Þ> 0, k being negative because ρ yx < 0: Further, by using this transformation and applying the technique of Hartley and Ross (1954), we have an unbiased dual to product estimator (see Singh, 2003) given by (1:8) where The variance of t 2 to O 1=n ð Þ is given by (1:9) where However, in all of these studies mentioned above, the underlying distribution of y is assumed to be from a normal population. In this paper, we consider the case where the underlying distribution is not normal, which is a more realistic in real-life situations. Zheng and Al-Saleh (2002) and Islam, Shaibur, and Hossain (2009) have studied the effectivity of modified maximum likelihood estimators (MMLEs) which play a key role in increasing the efficiency of the estimators. Using modified maximum likelihood (MML) methodology (see Tiku, Tan, and Balakrishnan (1986)), we propose a new dual to product type estimator that is based on order statistics. We have shown that the proposed estimator has always smaller MSE with respect to the corresponding unbiased dual to product estimator (1.8), unless the underlying distribution is normal. When the underlying distribution is normal, both the estimators provide exactly the same MSE. We support the theoretical result with simulations under several super population models and study the robustness property of the modified dual to product estimator. We show that utilization of MMLE for estimating finite populations mean results in robust estimate, which is very gainful when we have nonnormality or other common data anomalies such as outliers.

Long-tailed symmetric family
For the super population linear regression model, y i ¼ θx i þ e i ; i ¼ 1; 2; . . . ; n; let the underlying distribution of the study variable y follows the long-tailed symmetric family.
We realize that when p ¼ 1, (2.1) reduces to a normal distribution. The likelihood function obtained from (2.1) is given by The MLE of μ (assuming σ is known) is the solution of the likelihood equation which do not have explicit solutions. Vaughan (1992a) showed that Equation (2.2) is known to have multiple roots for all p < 1 but unknown and the number of roots increases as n increases.
The robust MMLE which is known to be asymptotically equivalent to the MLE is obtained in the following three steps: (see also, Oral 2010) (1) The likelihood equations are expressed in terms of the ordered variates: (2) The function gðz i Þis linearized by using the first two terms of a Taylor series expansion The values of t i ð Þ ; 1 i n are given in Tiku and Kumra (1981) for p ¼ 2 0:5 ð Þ10 and Vaughan (1992b) for p ¼ 1:5 when n 20: For n > 20; the approximate values of t i ð Þ can be used which are obtained from the equations We have now (2:5) A Taylor series expansion of g z i ð Þ À Á around t i ð Þ with first two terms of expansion gives Further, for symmetric distributions, it may be noted that t i ð Þ ¼ Àt nÀiþ1 ð Þ and hence Now, using (2.6) and (2.7) in (2.5), we have the modified likelihood equation which is given by Hence, the solution of (2.9) is the MMLEμ is given bŷ ( 2:10) where m ¼ ∑ n i¼1 β i Tiku and Vellaisamy (1996) and  showed that (2:12) The exact variance ofμ is given byμ where ω 0 is the 1 Â n row vector with elements 1=n. The elements of Ω are tabulated in Tiku and Kumra (1981) and Vaughan (1992b).
When σ is not known, the MMLEσ can be obtained as given by Tiku and Suresh (1992) and Tiku and Vellaisamy (1996) The methodology of MML is employed in those situations where maximum likelihood (ML) estimation is intractable as widely used by Puthenpura and Sinha (1986), Tiku and Suresh (1992), and Oral (2006Oral ( , 2010,  and Oral and Kadilar (2011). Under some regularity conditions, MMLEs have exactly the same asymptotic properties as ML estimators (MLEs) as discussed in Vaughan and Tiku (2000), and for small n values they are known to be essentially as efficient as MLEs.

The proposed dual to product estimator and its variance
In the context of sampling theory, Tiku and Bhasin (1982) and Tiku and Vellaisamy (1996) used the MMLE (2.10) and showed that utilizing the MMLEs lead to improvements in efficiencies in estimating the finite population mean.
Motivated from such approach, we propose a new unbiased dual to product estimator which is given by (3:1) assuming the population mean of the auxiliary variable X is known.
The expression for the variance of the proposed estimator T 1 ; up to the terms of order n À1 is given as follows: By using simple random sampling without replacement method of sampling, we have, where, the term Cov(μ; x) is calculated by

Monte-Carlo simulation study
In this study for the simulation, we have used R-programming software in the similar way as  and Oral and Kadilar (2011) did simulation study through FORTRAN software. We use the following model in the generated super population models (4:1) where we generate e i and x i independently and calculate y i for i ¼ 1; 2; . . . ; N: Let the random observations e i ; i ¼ 1; 2; . . . ; N be from (2.1) with E e ð Þ ¼ 0 and V e ð Þ ¼ σ 2 e . Let the population U N consists of N pairs (x 1 ; y 1 ), (x 2 ; y 2 ), …., (x N , y N ). To calculate the MSE of the proposed estimator in (3.1), we calculate T 1 for all possible simple random samples N n of size n ¼ 5; 11; 15 ð Þ from U N .
Since N n is extremely large, so we proceed as follows.
We consider N ¼ 500 and from the generated finite population U 500 pairs from an assumed super population, we have selected a sample of size n ¼ 5; 11; 15 ð Þby simple random sampling without replacement. Now, we choose at random S = 10,000 samples for all the possible 500 n samples of size n ¼ 5; 11; 15 ð Þ , which gives 10,000 values of T 1 . To compare the efficiency of the proposed estimator under different models for a given n, we calculate the values of mean square errors as follows: For setting the population correlation ρ yx sufficiently high as studied by Oral and Kedilar (2011) we choose the value of parameter θ in the model y ¼ θx þ e; such that the correlation coefficient between study variable (y) and auxiliary variable (x) is ρ yx . To determine the value of θ that satisfies this condition, we follow a similar way given by Rao and Beegle (1967) and write the population correlation between the study variable ðy) and the auxiliary variable (x). For example if X e U 0; 1 ð Þ; the value of θ for which the population correlation between y and x becomes θ 2 ¼ the corresponding values of θ can be calculated accordingly. Here we take σ 2 ¼ 1, in all situations without loss of generality and calculate the required parameter θ for which ρ yx ¼ À0:45:

Comparison of efficiencies of the proposed estimator
The conditions under which the proposed estimator T 1 is more efficient than the corresponding estimators y pu ; t 1 and t 2 are given as follows: (5:1) (5:5) and We assume two different super-population models given below to see how much efficiency we gain with the proposed modified estimator, when the conditions given in Section 5 are satisfied under non-normality: (1) x e U 0; 1 ð Þ and e e LTS p; 1 ð Þ (2) x e exp 1 ð Þ and e e LTS p; 1 ð Þ For the models (1) and (2), the values of θ which makes the population correlation ρ yx ¼ À0:45 are given below in Table 1.
Here, we note that for the LTS family (2.1), the value of θ does not depend on the shape parameter p.
To verify that the super-populations are generated appropriately, we provide a scatter graph and the underlying distribution of model for p ¼ 3:5 for model (2)  where MSE (.) and relative efficiency are given in the Table 2 for the model (1) and (2). From Table 2, we see that the proposed estimator T 1 is more efficient than the corresponding estimators y pu ; t 1 and t 2 because the theoretical conditions given in Section 5 are satisfied. We also observe that when sample size increases, mean square error decreases.

Robustness of the proposed estimator
The outliers in sample data are normally a focused problem for survey statistician. In practice, the shape parameters p in LTS p; σ ð Þ might be mis-specified. Therefore, it is very important for estimators to have efficiencies of robustness estimates.
Here, we take N ¼ 500 and σ 2 ¼ 1 without loss of generality and we study the robustness property of proposed estimator under different outlier models as follows.
We assume x e U 0; 1 ð Þas well as x e Exp 1 ð Þ and y e LTS p ¼ 3:5; σ 2 ¼ 1 À Á . We determine our superpopulation model as follows: Here, we realize that the model (5), the assumed super population model is given for the purpose of comparison and the models (6) and (7) are taken as its plausible alternatives. Here, we have assumed the super population model LTS 3:5; 1 ð Þ . The coefficients α i ; β i ð Þfrom (2.7) are calculated with p ¼ 3:5 and are used in models (5) and (6). N o in the model (6) Table 3. Here, theoretical conditions are satisfied for the models.
From Table 3, we see that the proposed estimator T 1 is more efficient than the corresponding estimators y pu ; t 1 and t 2 because the theoretical conditions are satisfied. We also observe that when sample size increases, mean square error decreases.

Determination of the shape parameter
It may be possible that the shape parameter p is unknown, then in such a case in order to determine whether a particular density is appropriate for the underlying distribution of the study variable y, a Q-Q plot is made by plotting the population quantiles for the density against the ordered values of y. The population quantiles t i ð Þ are determined from the equation ð t i ð Þ À1 t u ð Þdu ¼ i nþ1 ; 1 i n, where n is the sample size.
The Q-Q plot that closely approximates a straight line would be assumed to be the most appropriate. Using such procedure, we can also obtain a plausible value for the shape parameter simply.

Conclusions
In this study, we show that when the underlying distribution of the study variable is not normal (eg. Logistic distribution, T-distribution, etc.), which is applicable in most of the areas, MML integrated estimators can improve the efficiency of the estimators. In the paper, we show when the underlying distribution of the study variable is a long-tailed symmetric distribution, MML integrated dual to product estimator T 1 ð Þ can improve the efficiency of the unbiased dual to product estimator t 2 . The proposed estimator is also more efficient than the product estimators y pu and t 1 : We also show that the MML integrated dual to product estimator T 1 ð Þis robust to outliers as well as other data anomalies.