Use of Auxiliary Variables and Asymptotically Optimum Estimators in Double Sampling

This paper explores the need for exploiting auxiliary variables in sample survey and utilizing asymptotically optimum estimator in double sampling to increase the efficiency of estimators. The study proposed two types of estimators with two auxiliary variables for two phase sampling when there is no information about auxiliary variables at population level. The expressions for the Mean Squared Error (MSE) of the proposed estimators were derived to the first order of approximation. An empirical comparative approach of the minimum variances and percent relative efficiency were adopted to study the efficiency of the proposed and existing estimators. It was established that, the proposed estimators performed more efficiently than the mean per unit estimator and other previous estimators that don’t use auxiliary variable and that are not asymptotically optimum. Also, it was established that estimators that are asymptotically optimum that utilized single auxiliary variable are more efficient than those that are not asymptotically optimum with two auxiliary variables.


Introduction
In survey research, there are times when information is available on every unit in the population.If a variable that is known for every unit of the population is not a variable of interest but is instead employed to improve the sampling plan or to enhance estimation of the variables of interest, it is called an auxiliary variable.The auxiliary variable about any study population may include a known variable to which the variable of interest called the study variable is approximately related.This information may be used at the planning stage of the survey, in the estimation procedure, or at both phases.
The estimation of population parameters with greater precision is an unrelenting issue in sampling theory and the precision of estimates can be improved by increasing the sampling size, but doing so tend to sabotage the benefits of sampling.Therefore, the precision may be increased by using an appropriate estimation procedure that utilizes some auxiliary information which is closely related to the study variable and employing estimators that are asymptotically optimum.Laplace (1820) was the first to use auxiliary information in ratio type estimator.Watson (1937) used regression method of estimation to estimate the average area of the leaves on a plant.Cochran (1940) used auxiliary information in single-phase sampling to develop the ratio estimator for estimation of population mean.In the ratio estimator, the study variable and the auxiliary variable have high positive correlation and the regression line passes through the origin.Robson (1957) and Murthy (1964) worked independently on usual product estimator of population mean.General intuitive variable of interest, can be improved if the information supplied by a related variable (auxiliary variable, supplementary variable, or concomitant variable).When two or more auxiliary variables are available; many estimators may be defined by linking together different estimators such as ratio, product or regression, each one of them exploiting a single variable.These mixed estimators have been seen performing better as compared with individual estimators.Mohanty (1967), used this methodology for the first time to propose mixed estimator using two auxiliary variables.
Many other contributions are present in sampling literature and, recently, some new estimators appeared that found the asymptotical expression for the mean square error.Here we mention, among others, Upadhyaya et al (1992), Tracy and Singh (1999), Radhey et al (2002), Singh and Espejo (2007), Samiuddin and Hanif (2007) and Singh et al (2010).Also, estimators, with no information case and that utilize two auxiliary variables includes: Samiuddin and Hanif (2007) and Swain (2012).Motivated by these recent proposals, in this paper we propose, when two auxiliary variables are available, some new estimators obtained from the Mohanty (1967), Mukerjee et al (1987), andSingh andEspejo (2007).This paper explores the need for exploiting auxiliary variables and asymptotically optimum estimator to increase efficiency of estimators in double sampling.The paper is organized as follows: Section 2 introduces methods and estimators considered in the study.In Section 3, we present the notations and two proposed estimators and obtained, up to the first degree of approximation, the approximate expressions for mean square errors.Section 4 is devoted to the empirical study of the efficiency of the proposed estimators.Section 5 is on discussion of the results from the empirical analysis.Section 6 is on conclusion and recommendations.

Research Design
Consider a finite population  = ( 1 ,  2 , − − −,   ) of size N with the triple characters (, ,  , ), taking values   ,   ,    respectively on the unit   ( = 1,2, − − −, ).The purpose is to estimate the population mean of a study variable '' in the presence of two auxiliary variables ′′ and ′′.The population means  � and  ̅ of  and  respectively are not known, therefore, there is the need to adopt a double sampling technique.Assuming simple random sampling without replacement (SRSWOR) at each phase, the two phase sampling scheme runs as follows: A first phase sample  ′ ( ′ ⊂ ) of fixed size  1 is drawn from U to observe both  and  in order to find estimates of  � and  ̅ .Given  ′ , a second phase sample ( ⊂  ′ ) of fixed size  2 is drawn from  ′ to observe  in order to estimate the population mean of  � .Now, define the population means of ,  and  respectively as: The finite population variances of ,    respectively are: and More so, the covariance between  and ,  and , and  and  are given by:

Analytical Techniques
The analytical technique adopted in this study is the relative efficiency.It is used where the comparison is made between a given procedure and a notional "best possible" procedure.Gupta (2011), defined Relative Efficiency as a statistical tool that is used to measure the efficiency of one estimator over another estimator.The percent relative efficiency of estimator "" to estimator "β" is expressed as: According to Singh et al (2010), the percent relative efficiency can also be calculated using, Therefore, in this research the Percent Relative Efficiency (PRE) is a statistical tool that will be used to measure the efficiency of the proposed and previous estimators with respect to mean per unit estimator.

Estimators Used in Sampling Survey
In this section we analyzed the performance of the proposed estimators and other existing estimators considered popular by means of a numerical evaluation of the first order mean square error (MSE) to the first order of approximation.For a fixed sample size, we considered the efficiency of the estimators with respect to: (i) without the use of any auxiliary variable; (ii) exploiting a single auxiliary variable; (iii) utilizing double auxiliary variables.

Sampling without Auxiliary Variable
The mean per unit estimator is perhaps the oldest estimator in the history of sample survey .The estimator for a sample of size n drawn from a population of size N is defined as: (2.1) The mean square error (variance; as estimator is unbiased) can be immediately written as: Searle ( 1964) presented a modified version of mean per unit estimator as given below where k is a constant which is determined by minimizing mean square error of (2.4)

2 Sampling with one Auxiliary Variable
Auxiliary information is often used to improve the efficiency of estimators while using product, regression and ratio methods of estimation in survey sampling.Robson (1957), Introduced the idea of product estimator when there is highly negative correlation, the estimator is given as: (2.5) Sukhatme (1962), used auxiliary variable in his ratio type estimator for two-phase sampling as: (2.7) Srivastava (1971), developed a general ratio estimator: (2.10) Singh and Espejo (2007), developed a ratio-product estimator: (2.12) 2.3.4Sampling with Two Auxiliary Variables Various authors have proposed mixed type estimators, (that is, use of both ratio and regression estimators in some fashion).These mixed estimators perform better as compared with individual estimators.Mohanty (1967) proposed a Regression Ratio estimator: Mukerjee et al (1987), developed three regression type estimators.One was for the situation when no auxiliary information was available: Hanif et al (2010), proposed an estimator in two phase sampling given by: (2.18)
For notational purpose it is assumed that the mean of the estimated variable and auxiliary variables can be approximated from their population mean so that: (i) Where: ̅ ℎ and ̅ ℎ are the sample mean of the auxiliary variables  and  at h-th phase for ℎ = 1 and 2, for the variable of interest  � 2 is the sample mean of the study variable  for the second phase.Also: Therefore, the proposed estimators are: where  and  are suitable constants, 0 ≤  ≤ 1 and 0 ≤  ≤ 1 To obtain the MSE ( 1 ) to the first degree of approximation, express equation(3.1), in terms of ', we have: The negative exponential of (3.4) is expanded using the method of indeterminate coefficients Expanding the right hand side of (3.5), substituting (i) and retaining terms in first degree of ', we have: Subtracting  � from both sides of (3.6), squaring both sides and then taking expectations of both sides we get MSE of the estimator  1 , up to the first order of approximation as Expanding the right hand side of (3.7) and applying the notations of (ii) and (iii) we have: The optimum value of "" is obtained by differentiating (3.8), which gives it minimum value as: Substituting equation (3.9) in (3.8) and simplifying, the Mean Square Error of (3.1) we have: Similarly, to obtain the MSE ( 2 ), from (3.2) above to the first degree of approximation, substituting (i), we have: The negative exponential of (3.11) is expanded using the method of indeterminate coefficients Expanding the right hand side of (3.12) and retaining terms of first degree of ', we have: Subtracting  � from both sides of (3.13), squaring both sides and then taking expectations of both sides we: Expanding the right hand side of(3.14)and applying the notations of () we have: The optimum value of "" is obtain by differentiating (3.15), which gives it minimum value as: Substituting equation (3.16) in (3.15) and simplifying the Mean Square Error of (3.2) is:

Empirical Study
To analyze the performance of various estimators of population mean  � of study variable y, we considered the following two data sets:  shows, as compared to that in Data Set 1, a higher variability and higher correlation between the variables.Particularly, the high variability in the auxiliary variables may affect the first order mean square error making it inaccurate.Therefore, there is always the need to ensure that the auxiliary variables is highly correlated with the study variable and the population under consideration is homogeneously distributed and where there is no correlation between the auxiliary and study variables the application of double sampling may be futile.Also, where there is correlation between the study and auxiliary variables and such population is not homogeneously distributed, stratified double sampling will be more appropriate.
Furthermore, utilizing supplementary information to improve the performance of an estimator cannot be overemphasized, but it is worth to note that asymptotical optimum estimators performed better than non-asymptotical optimum estimators.In table 4.2, estimators  13 and  14 utilized only one auxiliary variable, but performed better than estimators  21 and  22 that used two auxiliary variables.The two mean per unit estimator considered in this study also shows that, the asymptotical optimum estimators  02 have advantage over the non-asymptotical optimum estimators  01 .Therefore, the performance of the asymptotical optimum estimators and non-asymptotical optimum estimators is shown in table 4.3.It reveals that the asymptotical optimum estimators ( 13 ,  14 ,  23 ,  1 and  2 ) perform better than the non-asymptotical optimum estimators ( 01 ,  11 ,  12 ,  21 and  22 ), except for the mean per unit estimator  02 .
The estimator  13 performed equally well as  14 and the first proposed estimator  1 used the second auxiliary variable for the regression and the first auxiliary variable for the ratio-product estimator and it performed better than the following estimators  01 ,  02 ,  11 ,  12 ,  21 , and  22 .The second estimator  2 is regression-cum-regression and product-cum-ratio estimator and it gave a higher precision over all the estimators considered in this study, but gave an equal precision as  23 .Though  23 is a regression and ratio-cum-product estimator and it uses the first auxiliary variable for the regression and the second auxiliary variable for the ratio-product estimator.Perry (2007), asserted that, when two or more auxiliary variables are available, many estimators may be defined by linking together different estimators such as ratio, product or regression, each one exploiting a single variable.

Conclusions
In the course of the research two asymptotical optimum estimators that utilize two auxiliary variables were proposed for increasing the efficiency of estimators in double sampling.
The study reveals that, where there is correlation between the study and auxiliary variables and such population is not homogeneously distributed, stratified double sampling will be more appropriate.

Table 4 .
1. Percent relative efficiency of different estimators compared to mean per unit estimator

MSE=Mean Square Error; PRE=Percent Relative EfficiencyTable 4 .
2. Percent relative efficiency of non-asymptotical optimum and asymptotical optimum estimators compared to mean per unit Estimator From table 4.1, in the first population, the estimators  11  12 ,  13 ,  14 ,  21 ,  22,  23 ,  1 , and  2 that utilizes supplementary (auxiliary variable) information has established superiority over the two estimators ( 01 and  02 ) that do not use such information.Also, in the second population all the estimators with the exception of  11 and  12 shown advantage over  01 and  02 that do not use the auxiliary variables.Probably, the discrepancy in the outcome of  11 and  12 is caused by the different types of populations considered.The population described in Data Set 2