Estimation of Population Mean in the Presence of Non-Response and Measurement Error

Under classical survey sampling theory the errors mainly studied in the estimation are sampling errors. However, often non-sampling errors are more influential to the properties of the estimator than sampling errors. This is recognized by practitioners, researchers and many great works of literature regarding non-sampling errors have been published during last two decades, especially regarding non-response error which is one of the cornerstones of the non-sampling errors. The literature handles one kind of non-sampling error at a time, although in real surveys more than one non-sampling error is usually present.In this paper, two kinds of non-sampling errors are considered at the estimation stage: non-response and measurement error. An exponential ratio type estimator has been developed to estimate the population mean of the response variable in the presence of non-response and measurement errors. Theoretically and empirically, it has been shown that the proposed estimator is more efficient than usual unbiased estimator and other existing estimators.


Introduction
Design based estimation methods use the sampling distribution that results when the values for the finite population units are considered to be fixed, and the variation of the estimates arises from the fact that statistics are based on a random sample drawn from the population rather than a census of the entire population, (see, Kish 1954, Sarndal, Swensson & Wretman 1992, Kish 1994, Gregoire 1998, Koch & Gillings 2006, Binder 2008, Dorazio 1999, Shabbir, Haq & Gupta 2014).
The results of a survey are used to make quantitative statements about the population studied i.e. descriptive statements about the aggregate population analytic statements about the relationship among subgroups of the population, or interpretive statements about the nature of social or economic processes.A survey error occurs when there is discrepancy between the statements and reality.These errors are of two types sampling and non-sampling errors.Sampling errors comprise the differences between the sample and the population due solely to the particular units that happen to have been selected.Non sampling errors encompass all other things that contribute to survey error.Non sampling errors are said to arise from wrongly conceived definitions, imperfections in the tabulation plans, failure to obtain response from all sample members, and so on.(see, Ilves 2011, Groves 1989).
In practice the researcher faces the problem of measurement error while collecting information from individuals.Measurement error is the difference between the value that is recorded and the true value of a variable in the study.For example, in surveys regarding household consumption/expenditure where the respondents are asked to report their consumption/expenditure catalog, there is a great likelihood that the respondents may fail to recall precisely how much they spent on various items over the interval.Many researchers have studied measurement errors like Cochran (1963), Cochran (1977), Fuller (1995), Shalabh (1997), Manisha & Singh (2001), Manisha & Singh (2002), Wang (2002), Allen, Singh & Smarandache (2003), Singh & Karpe (2007), Singh & Karpe (2008), Singh & Karpe (2009), Singh & Karpe (2010), (Gregoire & Salas 2009, Salas & Gregoire 2010), Shukla, Pathak & Thakur (2012), and Sharma & Singh (2013), etc.Another problem the researcher faces is due to non-response which refers to the failure to collect information from one or more respondents on one or more variables.The reasons non-response occurs include non-availability of the respondents at home, refusal to answer the questionnaire, lack of information, etc. Hansen & Hurwitz (1946) considered the problem of non-response while estimating the population mean by taking a sub sample from the non-respondent group with the help of extra efforts and an estimator was proposed by combining the information available from the response and non-response groups.In estimating population parameters like the mean, total or ratio, sample survey experts sometimes use auxiliary information to estimate improve precision.When the population mean of the auxiliary variable X is known and in presence of non-response, the problem of estimating the population mean of the study variable Y has been discussed by Cochran (1977), Rao (1986), Khare & Srivastava (1997), Kumar, Singh, Bhougal & Gupta (2011), Singh & Kumar (2008).In Hansen & Hurwitz (1946) method, questionnaires are mailed to all the respondents included in a sample and a list of non-respondents is prepared after the deadline is over.Then a sub sample is drawn from the set of non-respondents and a direct interview is conducted with the selected respondents and the necessary information is collected.
Researchers who have studied non-response have ignored the presence of possible measurement errors and researchers who have studied measurement errors have neglected non-response.In practice, it is possible for a researcher to face the problem of measurement error and non-response at the same time.Jackman (1999) dealt with both non-response and measurement error simultaneously, in the case of voter turnout, where a reasonably large body of vote validation studies supply auxiliary information, allowing the components of bias in survey estimates of turnout rates to be isolated.Averaging over the auxiliary information provides bounds on the quantity of interest, yielding an estimate corrected for both nonresponse and measurement error.Further, Dixon (2010) studied the estimation of non-response bias and measurement error on the data from Consumer Expenditure Quarterly Interview Survey (CEQ), Current Population Survey (CPS) and National Health Interview Survey (NHIS), an attempt to measure the differences in employment status of Washington.
In this paper, we have developed new estimators for estimating the population mean of the variable of interest when there is measurement error and non-response error in the study as well as in the auxiliary variable.An empirical study is carried out to show the efficiency of our suggested estimators over some available estimators.

Sampling Procedure and Some Well Defined Estimators
A simple random sample of size n is selected from the population of size N by a simple random sampling without replacement (SRSWOR) method.Let Y and X be the study variable and auxiliary variable, respectively.Let 2 denote the population mean and the population variance of the study variable y and auxiliary variables x.
Let (x i , y i ) be the observed values and (X i , Y i ) be the true values of two characteristics (x, y) respectively associated with the i th (i = 1, 2, . . ., n) sample unit.Let the measurement errors be Measurement errors are assumed to be random in nature and they are uncorrelated with mean zero and variances σ 2 U and σ 2 V respectively.Let σ 2 X and σ 2 Y denote the variances of the auxiliary variable X and the variable of interest Y respectively for the population, C y and C x be the coefficient of variations of variable Y and X respectively for the population, and let ρ yx be the coefficient of correlation between the variable Y and X for the population.We further assume that the measurement errors for variable Y and X are independent.
If there is some non-response, (Hansen & Hurwitz 1946) proposed a double sampling scheme for estimating population a mean, simple random sample of size n is selected and the questionnaire is mailed to the sampled units; the number of respondents in the sample is denoted by n 1 and the number of the non-respondents in the sample is denoted by n 2 and a sub-sample of size r (r = n 2 /k; k > 1) is taken from non-respondents in the sample, where k is the inverse sampling ratio.Let (x * i , y * i ) be the observed values and (X * i , Y * i ) be the true values of two characteristics (x,y) respectively associated with the i th (i = 1, 2, . . ., n) sample unit.Let the measurement error associated with the study variable be If there is complete response on the auxiliary variable, let the measurement error associated with the auxiliary variable be If there is some non-response on the auxiliary variable, let the measurement error associated with the auxiliary variable be The measurement errors are random in nature having mean zero and variances σ 2 U and σ 2 V respectively for the responding units and σ 2 U (2) and σ 2 V (2) respectively for non-respondents of the population.Let σ 2 X(2) and σ 2 Y (2) be the variances for variables X and Y respectively for population non-respondents and ρ yx(2) be the coefficient of correlation between the variable X and Y for non-respondents of the population.Let C x(2) and C y(2) be the coefficient of variations for variables X and Y respectively for the non-respondents in the population.It is further assumed that the measurement errors for variables X and Y are independent.
The usual unbiased estimator for the population mean of the study variable in the presence of measurement error is given as The variance in the presence of measurement error of the mean per unit estimator is given as where Shalabh (1997) developed the following ratio-type estimator in the presence of The mean square error of t 1 (using finite population correction factor) is given as: Shukla et al. ( 2012) developed the following estimator and α is a suitable constant.
The mean square error of t 2 in the presence of measurement error is given as: where is the mean square error of t 2 without measurement error, and is the contribution of measurement error to the mean square error of t 2 .
The optimum value of µ 1 is given as Cy Cx = µ 0, (say).The optimum mean square error of t 2 is given as When there is some non-response, it is assumed that the population of size N is composed of two mutually exclusive groups, the N 1 respondents and the N 2 non-respondents, though their sizes are unknown.Let 2 denote the mean and variance of the response group.Similarly, let 2 denote the mean and variance of the non-response group.The population mean can be written as i=1 y i and µ y2r = 1 r r i=1 y i denote the means of the n 1 responding units and the r sub sampled units.Thus an unbiased estimator of the population mean Y due to Hansen and Hurwitz is given by where w 1 = n 1 /n and w 2 = n 2 /n are responding and non-responding proportions in the sample.The variance of µ * y to terms of order n −1 , is given by where Cochran 1977, p. 371).
Let the information on auxiliary variable x be available and correlated with study variable y.In some situations, information on the auxiliary variable is not fully available i.e. non-response on auxiliary variable.One can define in similar manner to the above the auxiliary variable i.e. µ x1 = 1 n1 n1 i=1 x i and µ x2r = 1 r r i=1 x i denotes the means of responding and r sub sampled units.Under such situation, an unbiased estimator for the population mean X of the auxiliary variable as Cochran (1977) proposed the following ratio-type estimator of population mean The mean square error of t 3 is given as Singh & Karpe (2008) suggested the following generalized estimator of population mean where α 1 and α 2 are suitably chosen constants.
The optimum mean square error of t 4 is given as In the situation where there is measurement and non-response both in the study and auxiliary variables, one can obtain the following estimator The MSE of the estimator t 5 is given as where is the mean square error of t 5 without measurement error, and is the contribution of measurement error to the mean square error of t 5 .
In the present study, we have proposed exponential ratio type estimator in the situation where non-response and measurement errors are present in both study variable and auxiliary variable.

The Suggested Estimator
Non sampling errors are present in both sample surveys and censuses, and can occur at any stage of the survey process.There are many potential sources of non-sampling error, for example, businesses not responding to a survey, processing errors, or respondents unintentionally reporting incorrect values.The greater the impact these sources of error, the greater the difference will be between our survey (or census) estimate and the true value.The following is the proposed estimator when there is non-response and measurement error in both study and auxiliary variable For α = 0, the proposed estimator t becomes equal to estimator t 5 .
To obtain the expressions of mean squared error (MSE) of the proposed estimator, let us assume Similarly, one can obtain where Simplifying and ignoring terms of order greater than two, one can obtain Taking expectation on both sides of (15), we get the bias of t as Squaring both sides of equation ( 15) ignoring terms of order greater than two and taking expectations, the mean squared error (MSE) of t is where is the MSE of t without measurement error, and is the contribution of measurement error to the MSE of t.

Empirical Study
In this section, we demonstrate the performance of different estimators over the usual unbiased estimators, generating four populations from normal distribution with different choices of parameters by using R language program.The auxiliary information on variable X has been generated from N (5, 10) population.This type of population is very relevant in most socio-economic situations with one interest and one auxiliary variable.
Table 1: Percent relative efficiencies of the estimators with respect to the Hansen and Hurwitz (1946) estimator µ * y for population I.
Percent relative efficiencies of the estimator with respect to Hansen and Hurwitz (1946) estimator.

N1 N2
Esti.PRE without measurement error PRE with measurement error Table 3: Percent relative efficiencies of the estimators with respect to the Hansen and Hurwitz (1946) estimator µ * y for population III.
Percent relative efficiencies of the estimator with respect to Hansen and Hurwitz (1946)  From the above tables, the following points are noted: 1.The performance of the proposed estimator t at its optimum shows efficient PRE with respect to the Hansen and Hurwitz's estimator and t 5 estimator for all the four populations.
For population I, it is noted that  2. When N 1 = 4500; N 2 = 500, with increase in the value of k, the PRE of the estimators t 5 and t (opt) with respected to µ * y increases in both cases of with and without Measurement error, respectively.
3. When N 1 = 4250; N 2 = 750, in the case with Measurement error, the PRE's decreases and in the case without Measurement error, the PRE's increases with the increase in the value of k.

4.
When N 1 = 4000; N 2 = 1000, with increase in the value of, k increases in both cases with and without Measurement error, respectively.
For population II, it is noted that 5. When N 1 = 4500; N 2 = 500, with increase in the value of k, the PRE of the estimators t 5 and t (opt) with respected to µ * y decreases in the case without Measurement error and increases in the case with Measurement error.
6.When N 1 = 4250; N 2 = 750, in the case with Measurement error, the PRE's of the estimators increases in both cases with the increase in the value of k.

7.
When N 1 = 4000; N 2 = 1000, with the increase in the value of k decreases in both cases with and without Measurement error, respectively.
For population III, it is noted that 8.When N 1 = 4500; N 2 = 500 and N 1 = 4500; N 2 = 500 in the case without Measurement error, the PRE's decreases and in the case with Measurement error, the PRE's increases with the increase in the value of k.

9.
When N 1 = 4250; N 2 = 750, in the case without Measurement error, the PRE's increases and in the case with Measurement error, the PRE's decreases with the increase in the value of k.
10.For population IV, it is envisaged that with poor correlation between the study and the auxiliary variable, performance of the proposed estimator is better than the other estimators.
From Table 5; it is envisaged that the proposed estimator at its optimum performs more efficiently than the usual unbiased estimator, Shalabh's (t 1 ) and Shukla et al. (t 2 ) estimators for estimating the population mean in the presence of measurement error.

Conclusion
An important goal is understanding, managing, controlling, and reporting known sources of error having impact on the quality of our statistics.In the present study, we have proposed an estimator for estimating the population mean of the study variable.The suggested estimator uses auxiliary information to improve efficiencies in situations where there are non-response and measurement errors concerning study variable and auxiliary variable.The relative performance of the proposed estimators is compared with conventional estimators.The proposed estimator performs better than the usual unbiased estimator t 0 , Shalabh's (1997) (t 1 ) and Shukla, Pathak & Thakue (2012) (t 2 ) estimators in the presence of measurement error; and its performance is better than the usual unbiased estimator µ * y and estimator t 5 in the presence of measurement error and non-response.The study is supported by empirical study based on four populations.We recommend our proposed estimator for future assess to study the characteristics of the variable in interest where measurement errors and non-response occur in the survey.

Table 2 :
Hansen and Hurwitz (1946)cies of the estimators with respect to theHansen and Hurwitz (1946)estimator µ * y for population II.

Table 4 :
Hansen and Hurwitz (1946)cies of the estimators with respect to theHansen and Hurwitz (1946)estimator µ * y for population IV.

Table 5 :
Percent relative efficiencies of the estimators with respect to usual unbiased estimator in the presence of measurement error (no non-response).