Improving the efficiency of the ratio/product estimators of the population mean in stratified random samples

The efficiency of a statistic determines its efficacy. In stratified random sampling, many estimators for the population mean has been proposed. In this paper, we propose two new estimators both of which are combined ratio/product estimators. We refer to our estimators as mixture estimators. We derive the mean square errors (MSEs) up to the first order. A comprehensive simulation study was carried out to show the effectiveness of our estimators as compared to the conventional estimators that utilize auxiliary information. We also compared the performance of our estimators and some of the more popular competing estimators using real data. Both the simulations and real data analysis showed our estimators were more efficient than almost all existing estimators considered. Subjects: Applied Mathematics; Mathematics Education; Statistics & Probability


Introduction and notation
Sample surveys are usually deployed as the most cost-effective device for estimation of a population parameter. There are many competing estimators for a population parameter. The best estimator is usually the one with the smallest mean squared error (MSE). The MSE of an estimator is the sum of the variance and the square of the bias. It therefore accounts for both precision and accuracy. The smaller the MSE, the better the estimator. Many estimators have been proposed for ABOUT THE AUTHORS Andrew Vieira and Brendon Bhagwandeen are graduate students in the Department of Mathematics and Statistics, The University of the West Indies, St Augustine Campus, Trinidad and Tobago.
Dr. Isaac Dialsingh is a lecturer in the Department of Mathematics and Statistics at the University of the West Indies, St Augustine Campus, Trinidad and Tobago.

PUBLIC INTEREST STATEMENT
Have you ever thought about how the estimates for a simple population parameter like the mean income is calculated under different sampling scenarios? Well it is important to note that the sampling scheme whether it be simple random sampling, systematic sampling, stratified sampling or even cluster sampling impacts on the correct computation of the mean. In this paper, we look at some estimators of the population mean under stratified sampling. In this type of sampling usually, there is an auxiliary variable that we readily observe that may be correlated with the main variable of interest. This "additional" information can assist us in getting better estimates of the mean. In this paper, we propose two new estimators and show via simulations and real data analysis that our estimators are efficient under a variety of stratification configurations.
the estimation of the population mean under simple random and stratified random sampling. Some of the notable contributions to estimators in sampling literature include Cochran (1940), Murthy (1967), Kadilar and Cingi (2005), Singh and Vishwakarma (2007). This list is by no means exhaustive. For stratified samples, the use of an auxiliary variable (X) has been shown to improve on the efficiency of the estimators of the population parameters for the variable of interest (Y). Auxiliary information is used in the design and estimation stages of a survey. This paper focuses on improving efficiency when estimating the population mean from stratified samples with the help of auxiliary information at the estimation stage.
We assume a stratified random sample of size n is selected from a large multivariate population of size N: A simple random sample of size n i is taken without replacement from each stratum of size N i where i ¼ 1; 2; . . . ; k and k is the number of strata in the population. Data in each stratum are assumed to have come from a multivariate normal super population with finite population ; so that f i ffi 1. Surveys often make use of an auxiliary variable X ð Þ which is assumed to provide useful information in the estimation of the mean of the variable of interest (Y). The auxiliary variable in each stratum X i is more readily available than the variable of interest Y i . In estimating the population mean Y for the variable of interest, while assuming that the population mean of the auxiliary information X is known, the following classical estimators in stratified sampling have been proposed (Hansen & Hurwitz, 1943).

Classical estimators for population mean
The combined ratio estimator: and the combined product estimator: are by far the most popular estimators. We define y st and x st as the stratified sample means of the main and auxiliary variables, respectively. In these estimators, y st xst is the estimate of ratio of the population means while y st : x st is the estimate of the product of the population means. In addition, where w i ¼ N i N is the stratum weight, y i and x i are the stratum sample means for the main and auxiliary variables, respectively.
The ratio estimator is used for the estimation of the population mean when Y and X are positively correlated to each other while the product estimator is used when Y and X are negatively correlated to each other.
First-order approximations to the bias and MSE of the combined ratio/product estimators are derived. To do this, we let where ε 0 and ε 1 are errors which can be positive or negative such that E ε 0 ð Þ ¼ E ε 1 ð Þ ¼ 0. For stratified random sampling, , where ρ i is the correlation between the two (the main and auxiliary) variables in the i th stratum. In addition, and it is assumed that when the sample is sufficiently large such that ε 0 j j and ε 1 j j are small enough so that terms involving ε 0 and/or ε 1 to degrees higher than two are considered negligible.
By substituting the expressions for y st and x st in terms of ε 0 and ε 1 into Eqs. (2.1) and (2.2), the following are obtained: Assume ε 1 j j < 1 and expanding 1 þ ε 1 ð Þ À1 we obtain: (2:9) Here, R ¼ Y X , the ratio of the population means. Similarly, is obtained up to the first order of approximation (Singh & Mangat, 2013). In addition, Therefore, up to the first-order approximation, Þ if and only if C < À 1 2 . This indicates that the combined ratio/product estimators are relatively more efficient than y st , the unbiased stratified sample mean, when C > 1 2 and C < À 1 2 , respectively. Thus, b Y cr = b Y cp will not improve y st when À The MSEs of the combined regression and separate ratio estimators are given by: The improvement of y st has been an ongoing area of research. The idea for our proposed estimators come from a paper by Shirley, Sahai, and Dialsingh (2014) where a design parameter θ was used to improve the population mean estimation under a simple random sampling scenario. This provides the motivating factor for our proposed estimators of the population mean in the stratified random sampling scheme. Thus, the aim of this study is to improve on y st as well as the ratio and product estimators using auxiliary information. Bahl and Tuteja (1991) proposed ratio/product exponential estimators for estimating the mean of a finite population using a single auxiliary variable.

Other estimators in literature
The MSEs of these estimators are given by (Upadhyaya, Singh, Chatterjee, & Yadav, 2011) proposed an exponential ratio-typed estimator: The MSE is given by

Proposed estimators
The basis for our estimator comes from the results of Shirley et al. (2014) and our desire to extend this type of estimators to stratified sampling. Now, since b Y cr and b Y cp are more efficient than y st when C > 1 2 and C < À 1 2 , respectively, single-parameter linear combinations of b Y cr and y st , in addition to b Y cp and y st are used as our proposed estimators: In Eqs. (4.1) and (4.2), θ is the design parameter for the proposed estimators and is to be assigned an optimal value which minimise the first-order MSEs of the proposed estimators (Shirley et al., 2014). We note that when Based on the guess for the value of C, a suitable value of θ can be obtained.

Bias and mean square error of proposed estimators
The first-order approximations of the bias and MSE are derived using the notation introduced in Section 1 of this paper by substituting the expressions for y st and x st into Eqs. (4.1) and (4. Minimizing Eq. (5.3) with respect to θ, the optimal value of θ is C À 1. Therefore, in the proposed estimator in Eq. (4.1), Minimizing Eqs. (5.6) with respect to θ gives the optimal value of θ as À C þ 1 ð Þ. Therefore, in the proposed estimator in Eq. (4.2), θ ¼ À C Ã þ 1 ð Þis used, where C Ã is the guess of C.

Comparison of the estimators
Algebraic comparison of the MSEs of the estimators is not feasible, therefore, a simulation exercise is undertaken to facilitate this. We compared the estimated MSEs of the following estimators: from 10,000 sets of simulated data with sample sizes n ¼ 30 and 60. These sample sizes are selected since the combined estimators are recommended when the sample size within each stratum is small, 20 (Shirley et al., 2014). The relative efficiency of each of these estimators relative to y st were evaluated using: It is assumed the parent population is large and data from each stratum come from a population which is multivariate normal with the following parameters (for simplicity in illustration): Usually the variability of the auxiliary variable X is less than the main variable Y.
We vary the correlation between the main and auxiliary variables in each stratum. We define ρ i to be the correlation between X and Y in the ith strata: ρ i j j ¼ 0:1; 0:4; 0:7.
For simplicity, we used three strata. Therefore, where equal allocation is used, the strata weights were: For proportional allocation, we used two weight configurations. Configuration 1 used the strata weights, w 1 ¼ 1 5 ; w 2 ¼ 1 5 ; w 3 ¼ 3 5 while Configuration 2 used the weights where r Ã accommodates for under/over guess (Shirley et al., 2014). The following values of r Ã are used: 0; AE0:02; AE0:06; AE0:08. The statistical software package R Development Core Team (2008) was used for the simulations.
for negative values of ρ i are observed. The results of these simulations are summarized in Tables 1 and 2 (as well as Tables A1-A12 in Appendix A).

Application to real data
To assess the performance of our proposed estimators against classical (and competing) estimators, we applied our methods to a real dataset. Table 3 gives the summary statistics from the dataset from Murthy (1967). Table 1. Relative efficiencies (in %) of the ratio estimators when n ¼ 30; r Ã ¼ 0 and  Table 2. Relative efficiencies (in %) of the product estimators when n ¼ 30; r Ã ¼ 0 and  The MSE and relative efficiency values are given in Table 4.
The relative efficiencies in Tables 1 and 2 show the desired improvement the proposed estimators achieve versus b Y cr and b Y cp (and how well these proposed estimators are than better b Y creg and b Y sreg ). When À 1 2 C 1 2 , the proposed estimators are considerably more efficient than y st despite the combined ratio/product estimators b Y cr and b Y cp being worse than y st (which uses no auxiliary information). However, when C j j is significantly less than 1 2 , unlike b Y creg and b Y sreg , the proposed estimators do make proper use of auxiliary information and produces improved results.
Observing further simulations, from the relative efficiencies in Tables A1-A12, the proposed estimators consistently performed better the combined ratio/product estimators. It is also observed, regardless of sample size or stratum weight used, for similar correlation values within each stratum, the proposed estimators performed better than the combined and separate regression estimators. However, when the correlation in each stratum varies, coupled with the sensitivity to the under/over guess of C, the performance of the proposed estimators seem to fluctuate when compared with the regression estimators.
Applying our proposed estimators to a real data, we observe that they perform remarkably better than the classical estimators y st , b Y cr and b Y cp , even when b Y cp is less efficient than y st . The proposed estimators even matched the combined regression estimator b Y creg for this dataset. In addition, our proposed estimators outperformed other existing exponential-type estimators.