Joint influence of measurement errors and randomized response technique on mean estimation under stratified double sampling

The present study proposes a generalized mean estimator for a sensitive variable using a non-sensitive auxiliary variable in the presence of measurement errors based on the Randomized Response Technique (RRT). Expressions for the bias and mean squared error for the proposed estimator are correctly derived up to the first order of approximation. Furthermore, the optimum conditions and minimum mean squared error for the proposed estimator are determined. The efficiency of the proposed estimator is studied both theoretically and numerically using simulated and real data sets. The numerical study reveals that the use of the Randomized Response Technique (RRT) in a survey contaminated with measurement errors increases the variances and mean squared errors of estimators of the finite population mean.


Introduction
A uxiliary variables are closely related to the survey variable and are used in a survey at the design and estimation stage to improve the efficiency of estimators of the finite population mean. The difference between the true value of a variable and the value recorded in a survey is referred to as measurement errors. Measurement errors are caused by memory loss, prestige bias, over-reporting, under-reporting, processing errors, and incorrect values from the respondent. In literature, most researchers assume that the data collected in a survey are error-free. However this is not the case, the problem of measurement errors is inherent in survey sampling.
In a survey, the researcher faces the problem of estimation of the finite population mean for a sensitive survey question with a social stigmatizing characteristic such as "Have you ever had an abortion?", "Are you a drug addict?" and "Have you ever been infected with sexually transmitted diseases?". Moreover, it is challenging to obtain the correct responses on such questions in personal interviews which involve direct questioning of the subjects because the respondent's privacy is unprotected. Consequently, this may result in measurement errors. Warner [1] proposed the Randomized Response Technique (RRT) which aims at reducing answer bias in a survey involving a sensitive variable through the protection of the privacy of the respondents. In the Randomized Response Technique (RRT), a scrambled variable that is independent of the survey and auxiliary variables are used in the estimation of the finite population means of a sensitive variable. The respondent is expected to provide a true response for the non-sensitive auxiliary variable and a scrambled response for the survey variable. The scrambled response is obtained by adding a random number to the true response of a sensitive question. The value added is unknown to the survey practitioners but the probability distribution of the scrambled response is assumed to be known.
The problem of estimation of the finite population mean for a non-sensitive variable using auxiliary variable under simple random sampling is addressed by Shalabh [2], Diwakar et al., [3] and, Yadav et al., [4]. Additionally, Gajendra et al., [5] used calibrated weights to propose ratio and regression type mean estimators for a non-sensitive variable under stratified random sampling.

193
The problem of estimation of the finite population mean for a sensitive variable based on Randomized Response Technique (RRT) under different sampling schemes is addressed by Eichhorn and Hayre [6], Gupta and Shabbir [7], Gupta et al., [8], Sousa et al., [9] and Tanveer and Housila [10].
Mushtaq et al., [11] and Mushtaq et al., [12] have proposed different estimators of the finite population mean for a sensitive variable using a non-sensitive auxiliary variable under stratified random sampling. The problem of estimation of the finite population mean under stratified two-phase sampling is discussed by Mushtaq et al., [12]. The joint influence of double sampling and the Randomized Response Technique (RRT) on the estimation of the finite population mean under simple random sampling is addressed by Mushtaq and Noor-Ul-Amin [13]. Additionally, the problem of estimation of the finite population mean for a sensitive variable in the presence of non-response based on the Randomized Response Technique (RRT) is discussed by Naeem and Shabbir [14]. Zahid and Shabbir [15] proposed a generalized class of estimators of the finite population mean using a non-sensitive auxiliary variable in the presence of non-response and measurement errors under simple random sampling and stratified random sampling.
Sadia [16] proposed generalized estimators of the finite population mean in the presence of measurement errors under simple random sampling and stratified random sampling. The performances of the proposed estimators were studied in the presence and absence of the measurement errors. Recently, Zhang [17] addressed the problem of mean estimation for a sensitive variable based on optional Randomized Response Technique (RRT) in the presence of non-response and measurement errors under simple random sampling and stratified random sampling.
Handling sensitive survey questions and measurement errors is a major challenge for survey practitioners especially when both occur simultaneously in a survey. The present study fills the existing gap in the literature on mean estimation for a sensitive variable using a non-sensitive auxiliary variable in the presence of measurement errors under stratified double sampling. Also, the combined effect of measurement errors and Randomized Response Technique (RRT) on estimators of the finite population mean is investigated.
The study considers an additive Randomized Response Technique (RRT) model in which the respondent adds a random number to the true answer of a sensitive question to give a scrambled response. Further, the probability distribution of the scrambling variable is assumed to be known by the survey practitioner. The proposed strategy assumes that measurement errors are present in both first and second-phase samples of stratified double sampling.
In the present paper, Section 2 gives a detailed description of the population under study. The ordinary mean estimator of the finite population mean for a sensitive variable is discussed in Section 3. Section 4 describes the properties of the proposed estimator of the finite population mean for a sensitive variable using a non-sensitive auxiliary variable in the presence of measurement error. In Section 5, members of the family of the proposed generalized estimator are discussed. The efficiency of the proposed estimator is studied theoretically in Section 6. Finally, a numerical analysis of the performance of the proposed estimator is done in Section 7.

Population description and notations
Consider a heterogeneous population U = 1, 2 . . . N of size N consisting of a survey variable Y, and auxiliary variable, X. The population is categorized into L homogeneous groups of sizes N h each known as strata. In a survey, direct observations cannot be made on a sensitive variable with social stigmatizing characteristics hence the Randomized Response Technique (RRT) is used for obtaining unbiased estimates of the finite population parameters. Let S, be a scrambling variable that is normally distributed with mean 0 and variance S 2 Sh . The respondent is expected to provide a true response for the auxiliary variable and a scrambled response for the sensitive variable. Let Z hi = Y hi + S hi , denote the i th value of a scrambled response in h th stratum. Further, let Z hi and X hi denote i th value of Z and X respectively in h th stratum. Additionally, let Z h and X h be the population means for Z and X respectively in h th stratum. Further, let S 2 Zh and S 2 Xh be the population variances of Z and X respectively in h th stratum. Let S ZXh and ρ ZXh denote the covariance and coefficient of correlation between their subscripts in h th stratum.
In the presence of measurement errors, let (x * hi , z * hi ) and (X * hi , Z * hi ) be the observed and true values of X and Z respectively in h th stratum. Let T * hi = z * hi − Z * hi and V * hi = x * hi − X * hi denote the measurement errors associated with Z and X respectively in h th stratum. The measurement errors are assumed to be normally distributed with mean zero and variances S 2 Th and S 2 Vh , for Z and X respectively in h th stratum.
A relatively large sample of size n is drawn from the population using a simple random sampling without replacement (SRSWOR) and the units are classified into L homogeneous strata of size n h each. A second phase random sample of size n h is drawn from the first phase sample using a simple random sampling without replacement (SRSWOR) and both the survey and auxiliary variables are studied. Let x h denote the first phase h th stratum sample mean for X. Further, let x h and z h denote the second phase h th stratum sample means for X and Z respectively. Let and Take expectation on both sides of Equations (1)- (3) to obtain Square both sides of Equations (1)-(3) and then take expectations to obtain where θ h = 1

Existing estimators in the literature
The ordinary mean estimator in the presence of measurement errors in stratified double sampling is defined as The variance is given as

Proposed estimator
x h denote the first and second-phase stratum sample means for the auxiliary variable respectively. Further, let z h = 1 n h ∑ L h=1 z h denote the mean for a scrambled response in the second phase stratum sample and w h denote the h th stratum weight. The proposed estimator of the finite population mean in the presence of measurement errors is given as where α h , is a suitably chosen constant whose value is to be determined. Substitute Equations (1) -(3) in (12) and solve using Taylor's approximation while ignoring terms of order greater than two, and then subtract the population mean to obtain Take expectations on both sides of Equation (13) and substitute Equations (4)- (9) to obtain the approximation for the bias as Square both sides of Equation (13) and simplify while ignoring terms of order greater than two, and then take expectations to obtain the approximation for the mean squared error as Differentiate Equation (15) partially with respect to α h and then equate to zero to obtain Substitute Equation (16) in (15) to obtain the minimum mean squared error as

Members of family of Proposed generalized estimator
Members of the family of the proposed estimator are obtained as follows; (i) For α h = 1 2 , the proposed estimator reduces to ratio estimator given as The bias and mean squared error are given as and (ii) For α h = 1 , the proposed estimator reduces to exponential ratio-type estimator given as The bias and mean squared error are given as and (iii) For α h = 0 , the proposed estimator reduces to exponential ratio-product-type estimator given as The bias and mean squared error are given as and

Efficiency comparison
In this section, the performances of the proposed estimators are studied theoretically.
i. From Equations (11) and (17), ii. From Equations (17) and (20), iii. From Equations (17) and (23), iv. From Equations (17) and (26), The stated inequalities provide the necessary conditions under which the proposed optimum estimator is more efficient than existing estimators of the finite population mean. The numerical study reveals that these conditions are true hence the proposed optimum estimator is recommended for use by survey practitioners when the conditions hold. Furthermore, the proposed strategy is useful for the construction of accurate confidence intervals for unknown population parameters in a survey based on the Randomized Response Technique (RRT) and contaminated with measurement errors.

Introduction
A numerical study is conducted using both simulated and real data sets to compare the performance of the proposed estimator with some existing estimators in the literature. The real data set is obtained from Sarndal et al., [18]. The simulated data is generated using R−programming Language. The data sets consist of the survey variable, Y and auxiliary variable, X. Scrambling responses that are normally distributed, S hi ∼ N (0, 2) is generated for each unit in the data set. Thereafter, the response variable is obtained as Z hi = Y hi + S hi . Finally, normally distributed measurement errors with mean 2 and variance 5 are introduced to each unit of the response and auxiliary variables. The efficiency of the proposed estimator is compared with other estimators using the minimum variance and the Percent Relative Efficiency (PRE) approaches. The Percent Relative Efficiency (PRE) of the estimators are obtained using the expression; where t j = t g , t r , t err and t erp denotes estimators of the finite population mean. The estimator with the highest PRE is considered to be more efficient than the corresponding estimators. The performances of the estimators are compared in cases for measurement errors and without measurement errors. The description of the populations are as follows; Population I: Simulated data Stratum 1 (100, 450, 15), (100, 0, 1), (100, 0, 0.2), and Stratum 2 (250, 50, 15), , and z 2 = Z 2 + rnorm(250, 2, 5).

Stratum 3
X 3 = rnorm(300, 920, 25), Population II: Sarndal et al., [18] The population consist of five strata of sizes; N1 = 38, N2 = 14, N3 = 11, N4 = 33, and N5 = 24. Table 1 represents summary statistics for populations I and II.  Tables 2 and 3 show the contribution of measurement errors and the Randomized Response Technique (RRT) to the bias, mean squared error (MSE), and Percent Relative Efficiency (PRE) of the mean estimators. Through numerical study, it is observed that the Mean Squared Error (MSE) for the estimators are lower in cases without measurement errors but increases when measurement errors are introduced into the survey. Moreover, the Percent Relative Efficiency (PRE) for the mean estimators decreases when measurement errors are present in the survey. Additionally, the proposed generalized estimator has the minimum bias compared to other estimators of the finite population mean. A very significant finding of the study is that the proposed estimator performs better than other estimators under both cases for with and without measurement errors for both real and simulated data.

Conclusion
The study proposes a generalized estimator of the finite population mean for a sensitive variable using a non-sensitive auxiliary variable in the presence of measurement errors based on the Randomized Response Technique (RRT). Expressions for the bias and Mean Squared Error (MSE) for the proposed estimator have been derived up to the first order of approximation. The performance of the proposed estimator has been studied both theoretically and numerically. The numerical study reveals that the presence of measurement errors in a survey based on the Randomized Response Technique (RRT) increases the variance and Mean Squared Error (MSE) resulting in biased estimates of the finite population mean. Finally, the proposed strategy is applicable in surveys involving sensitive variables such as bribery, cheating in examination, drug abuse, homosexuality, habitual tax evasion, reckless driving, abortion, indiscriminate gambling among others.