An Improved Estimation Procedure of the Mean of a Sensitive Variable Using Auxiliary Information

Let Y be the variable under study, a sensitive variable which can’t be observed directly. Let X is a non – sensitive auxiliary variable which is strongly correlated with Y. Let S be a scrambling variable independent of the study variable Y and the auxiliary variable X. The usual additive model used for gathering information on quantitative sensitive variable is due to Himmelfarb & Edgell [3]. Their model allows the interviewee to hide personal information using a scrambling variable to their response. The respondent is asked to report a scrambled response for the study variable Y (based on additive model) given by Za =Y+S, but is asked to provided a true response for the auxiliary variable X [1].


Introduction
Let Y be the variable under study, a sensitive variable which can't be observed directly. Let X is a non -sensitive auxiliary variable which is strongly correlated with Y. Let S be a scrambling variable independent of the study variable Y and the auxiliary variable X. The usual additive model used for gathering information on quantitative sensitive variable is due to Himmelfarb & Edgell [3]. Their model allows the interviewee to hide personal information using a scrambling variable to their response. The respondent is asked to report a scrambled response for the study variable Y (based on additive model) given by Za =Y+S, but is asked to provided a true response for the auxiliary variable X [1].
Hussain [4] have discussed the use of subtracting scrambling. Thus following Hussain [4], the respondent is asked to report a scrambled response for the study variable Y (based on subtractive model) given by Zs = Y-S, but is asked to provide a true response for the auxiliary variable X. It is interesting to mention that the proposed model generalizes both usual additive and subtractive models. Gjestvang & Singh [5] have pointed out that "the practical application of an additive model is much easier than the multiplicative model, that is, respondents may like to add two numbers rather than doing painstaking work of multiplying two numbers or dividing two numbers: thus the improvement of the additive model has its own importance in the literature". Looking at the form the additive model, subtractive model and above arguments due to Gjestvang & Singh [5] we have introduced a new model (which is additive in nature) Z Y S ϕ ϕ = + where ϕ is a known scalar such that 1 1 ϕ − ≤ ≤ .
Thus keeping the proposed model Z Y S ϕ ϕ = + in view, the respondent is asked to report a scrambled response for Y given but is asked to provide a true response for X. Let a simple random sample of size n be drawn without replacement from a finite population U=(U1,U2,…UN). For the ith unit (i=1,2,… ,N), let and respectively be the values of the study variable Y and the auxiliary variable X. Further, let be the population mean for Y, X and Z ϕ respectively. We assume that the population mean X of the auxiliary variable X is known and We also define (1 )

Biostatistics and Biometrics Open Access Journal
, where Cx and z C ϕ are coefficients of variation of X and Z ϕ respectively, and xz ϕ ρ is the correlation coefficient between X and Z ϕ .The square of the coefficient of variation of Z ϕ (i.e. is given by Further if the information on auxiliary variable X is not utilized, then the mean square error of the estimator The mean square error of the estimator , the choice of the value of the scalar between -1 to +1 is justified. We note that Sousa et al. [1] have mentioned in their study (about their proposed estimators) that "there is hardly any difference in the first order and second order approximations for mean square error (MSE) even for small sample sizes". Keeping this in view, we have studied the properties of the proposed estimators in the subsequent sections only to the first order of approximation. The merits of the proposed estimators are examined through numerical illustration.

The suggested ratio estimator
We consider the following ratio estimator for the population mean Y of the study variable Y using the known population mean of the auxiliary variable X: We note that for =1, the proposed estimator reduces to the estimator which is due to Sousa et al. [1], where For ϕ =0, (2.1) reduces to the classical ratio estimator based on true responses of variables Y and X.
Expressing (2.1) in terms of z e ϕ and ex we have We assume that |ex|<1 so that (1+ex)-1 is expandable. Expanding the right hand side of (2.4), multiplying out and neglecting terms of e's having power greater than we have Taking expectation of both sides of (2.5) we get the bias of to the first order of approximation as It is observed from (2.6) that the bias of the proposed estimator ( ) R t ϕ is independent of  . So whatever be the value of , the bias of ( ) R t ϕ will remains same as given in (2.6). Thus the bias of the proposed estimator ( ) R t ϕ and the bias of the estimator (1) R t due to Sousa et al. [1] are same. This fact can also be seen from (2.6) and (2.9).
Squaring both sides of (2.5) and neglecting terms of e's having power greater than two we have Taking expectation of both sides of (2.7) we get the mean square error (MSE) of ( ) R t ϕ to the first degree of approximation

Efficiency Comparison
From (1.1) and (2.8) we Thus the proposed estimator ( ) R t ϕ is more efficient than the usual unbiased estimator a z as long as the condition (2.11) is satisfied. The conditon (2.11) also holds for the proposed estimator ( ) R t ϕ to be better than the usual estimator based on subtractive model. Further from (2.8) and (2.10) we have

From (1.3) and (2.8) we have
Thus it follows from (2.11), (2.12) and (2.13) that the suggested estimator ( ) R t ϕ is more efficient than the unbiased estimator , a s z z , z ϕ and the ratio type estimator (1)

R t due to
Sousa et al. [1].

Remark 2.1:
If the correlation between the two variables Z and the auxiliary variable X is negative high, then one can consider the following product-type estimator for the population mean y as To exact bias of the proposed product -type estimator ( ) P t ϕ is given by which is same as the bias of the classical product estimator based on true response of variables Y and X.
It is observed from (2.15) that the bias expression of ( ) P t ϕ is free from the scalar  . So whatever be the value of , the bias of ( ) P t ϕ will remains same as given in (2.15).
The mean square error of the estimator ( ) P t ϕ to the first degree of approximation is given by .17) which depends on the value of the scalar . So one should be careful in selecting the value of .

From (1.3) and (2.17) we have
Which equals to the same condition in which the classical product estimator tP is better than usual unbiased estimator y

Empirical Study
To judge the superiority of the proposed estimator ( ) R t ϕ over , a s z z and the ratio type estimator (1) R t due to Sousa et al.
[1]we have computed the percent relative efficiencies of ( ) R t ϕ with respect to , a s z z and (1) R t by using the formulae: For the percent relative efficiency (PRE's) computation purpose we assume for the sake of simplicity that , ( / ) 1 , where  is a scalar in percent, (i.e.  % ) as mentioned in Sousa et al. [1], Gupta et al. [2]. Under the above assumptions the PRE's formulae given by (2.14), (2.15) and (2.16) respectively reduce to:

I.
For fixed values of  = 10 % , 20% , 30 % , larger gain in efficiency is observed by using the proposed estimator II. For  = 10 % the gain in efficiency by using the proposed estimator ( ) R t ϕ over the ratio type estimator due IV. The maximum gain in efficiency is observed when  =0, which is obvious because proposed additive model z ϕ becomes free from the scrambling.

V.
For fixed value of (yx , ), the values of

PRE t z PRE t z and PRE t t
ϕ ϕ ϕ ϕ increase as the values of the correlation coefficient  yx increases.
Overall we conclude that the proposed estimator ( ) R t ϕ is to be preferred in practice when: i.
The standard deviation of the scrambling variable S is closer to the standard deviation of the auxiliary variable X. ii.
The value of  is closer to 'zero' and the value of correlation coefficient  yx is larger.

Proposed Regression Estimator
To obtain the regression estimator of the population mean y we first define the difference estimator for y as ( ) where d is a suitably chosen constant. It is easy to verify that the difference estimator td is unbiased estimator of the population mean y .
The variance of the estimator td is given by which is minimized for To obtain the bias of the regression estimator t lr we further write We assume that |e 2 | <1 so that (1+e 2 ) -1 is expandable. Now expanding the right hand side of (3.7), multiplying out and neglecting terms of e's having power greater than two we have 1 2 (1 ) In the light of (3.13), the expression (3.12) reduces to: µ to the first degree of approximation is given by which can be also obtained from (3.14) just by setting =1.

Empirical Study
To judge the merits of the suggested regression estimator tlr over Gupta et al. [2]    are larger than 100. So the proposed regression estimator t lr is more efficient than that of Gupta et al. [2] regression estimator Rê g µ when || < 1. There is considerable gain in efficiency by using the proposed regression estimator t lr over Gupta et al.'s (2012) regression estimator Rê g µ when the value of  is in the neighborhood of 'origin', the value of  yx is closer to 'unity' and the value of  is moderately large. Thus in such situations our recommendation is to use the proposed regression estimator t lr as long as || < 1.