Estimating two sensitive characters with equal probabilities of protection

Abstract: This work seeks to improve the procedure for estimating two sensitive characteristics at a time by introducing equal probabilities of protection on the randomized devices. Equal protection estimators have been proposed. It was discovered that equal probabilities of protection increases the efficiency. The proposed model and estimators have been shown to be more robust and efficient than the conventional ones.


Introduction
Sample Survey research is one of the most essential areas of measurement in applied social sciences research. This covers any measurement techniques that involve asking questions from respondents. Beam (2012) in his book The Problem with Survey Research argues that all survey research instruments produce unreliable and potentially inaccurate results. Also, non-response or false response in human survey involving sensitive characters contributes great bias to obtaining correct estimate of the population parameter. For various reasons respondents in a sample survey may prefer not to disclose to the interviewer the correct answers to some questions. Such questions may be sensitive or highly personal, this situation may likely lead to refusal of answers or ambiguous answer. The resulting ambiguous answer bias is difficult to measure. This makes Randomized Response Technique (RRT) appropriate for such survey. RRT is a research method used in structured survey interview; it was first introduced by Warner (1965) and modified by Greenberg, Abul-Ela, Simmons, and Horvitz (1969), Mangat and Singh (1990), Mangat (1994), Odumade and Singh (2009), Adebola and Johnson (2015) and Adebola, Adediran, and Ewemooje (2017). The successes of the methods depend on the

PUBLIC INTEREST STATEMENT
The proposed model is to improve the procedure for estimating two sensitive characteristics at a time by introducing equal probabilities of protection on the randomized devices. The proposed model and estimators have been shown to be more robust and efficient than the conventional ones. Therefore, with the use of proposed model social science researchers and those in behavioural science have little or no problem in surveying sensitive issues such as induced abortion, drug abuse, tax evasion, illegal possession of arms and all embarrassing or illegal activities.

The simple model
A simple random and with replacement sample of n respondents from the given population was selected. Two shuffled decks of cards are presented to each respondent in the sample. The two decks are marked: Deck I and Deck II and each deck is comprised of two types of cards that indicate whether or not the respondent possesses a certain sensitive characteristic. Each respondent is requested to draw one card from each deck of cards, match his or her status with the statement on the card drawn from Deck I and Deck II, and then response in terms of "Yes or No" without revealing the statement written on the card to the interviewer. The probabilities of (Yes, Yes), (Yes, No), (No, Yes), and (No, No) are denoted by θ 11 , θ 10 , θ 01 and θ 00 respectively as follows: where A , B and AB are the population proportions of respondents belonging to character A, B and AB, respectively.
Simple model which is a special case of Christofides (2005) has unbiased estimators of the proportion A , B and AB as: where P and T are the probabilities of belonging to sensitive characteristic A in Deck I and sensitive characteristic B in Deck II, respectively, for T ≠ 0.5 and P ≠ 0.5. Also, ̂1 1 = n 11 n, ̂1 0 = n 10 n , ̂0 1 = n 01 n and ̂0 0 = n 00 n .
The variances of the estimators are given as: (1)

The crossed model
The procedure and all assumptions are the same as in simple model. The method also uses two decks of cards but with the statements on the cards differ for the Crossed Model (see Lee et al., 2013 for details). Using crossed model, each response, (Yes, Yes), (Yes, No), (No, Yes), and (No, No) for individuals in population may also occur in four different ways; the probabilities of getting each response were given as: The proposed unbiased estimator of the population proportions A , B and AB for the crossed model is given by: for P + T ≠ 1.
The variances of these estimators are given as: for P + T ≠ 1.
On the probability of protection for the respondents in each sensitive character, Lee et al. (2013) probabilities but inferred that their proposed model perform better when there is equal probabilities of protection on the sensitive characteristics i.e. W 1 = Q 1 = P 1 = T 1 = P = T = 0.7. Ewemooje and Amahia (2016) also looked at the efficiency of Lee et al.'s (2013) simple and crossed models in the face of equal and unequal probabilities of protection. It was discovered that equal probabilities of protection increases the efficiency. Hence, this work aimed at improving the procedure by introducing equal probabilities of protection on the randomized devices.

Equal protection model
The model allows for equal probabilities of protection on the randomized devices for two sensitive characteristics where respondent is required to truthfully answer "yes" or "no" to the sensitive question; if "no", the respondent is to go through two randomized devices (deck of cards). He or she is required to draw a card from deck I of cards containing two statements: • I belong to character A with probability P.
• I do not belong to character A with probability 1 -P.
And respond "yes" or "no" accordingly without reporting the statement on the card to the interviewer. Also, the respondent proceeds to next stage by following the same procedure of drawing another card from deck II of cards containing either of the two statements: • I belong to character B with probability P.
• I do not belong to character B with probability 1 -P.
And answer "yes" or "no" accordingly without reporting the statement on the card to the interviewer. The procedure can be duly represented by Figure 1. The responses obtained can be grouped into four different places: n 11 , number of respondents that answered "yes" to character A and "yes" to character B; n 10 , number of respondents that answer "yes" to character A and "no" to character B; n 01 , number of respondents that answered "no" to character A and "yes" to character B and n 00 , number of respondents that answer "no" to character A and "no" to character B. Note that n 11 + n 10 + n 01 + n 00 = n i.e. ∑ 1 i=0 ∑ 1 j=0 n ij = n. Consequently, the probabilities of (yes, yes), (yes, no), (no, yes), and (no, no) can be represented with as θ 11 , θ 10 , θ 01 and θ 00 respectively where Using the proposed model, we have the expression for the probabilities as: The distance between the observed probabilities, ̂i j , and the true probabilities, θ ij , is minimized using the expression: We further differentiate D with respect to A , B and AB and equate to zero i.e. dD d A = 0, dD d B = 0 and dD d AB = 0. Then solve simultaneously in order to obtain unbiased estimators of A , B and AB .
The proposed equal protection unbiased estimators of A , B and AB are given by: for P > 0.

Tests for unbiasedness of the estimators
In testing for unbiasedness of ̂A, we have: Hence, ̂A is unbiased.
In testing for unbiasedness of ̂B, we have: Hence, ̂B is unbiased.
In testing for unbiasedness of ̂A B , we have: Hence, ̂A B is also unbiased.

Variance estimation
To obtain the variance of the proposed estimators ̂A,̂B and̂A B presented in Equations (26)-(28), respectively, we have: In solving these variances, the following variance and covariance operators were used: V x 11 = 11 1 − 11 , V x 10 = 10 1 − 10 , V x 01 = 01 1 − 01 , V x 00 = 00 1 − 00 , C x 11 , x 10 = 11 10 , C x 11 , x 01 = 11 01 , C x 11 , x 00 = 11 00 , C x 10 , x 01 = 10 01 , C x 10 , x 00 = 10 00 , C x 01 , x 00 = 01 00 where V and C are the operators of variance and covariance over the randomized response device, respectively. x 11 is "yes" response for character A and "yes" response for character B, x 10 is "yes" response for character A and "no" response for character B, x 01 is "no" response for character A and "yes" response for character B while x 00 is "no" response for character A and "no" response for character B.
Solving this rigorously, the variances for the unbiased estimators ̂A,̂B and̂A B are given by: for P > 0.
The sample estimators of the variances V(̂A), V(̂B) and V(̂A B ) are given as follows: for P > 0.

Remarks
An unbiased estimator of the proportion of people belonging to at least one characteristic A or B is given as: An unbiased estimator of the proportion of people belonging to precisely characteristic A and not B is given as: An unbiased estimator of the proportion of people belonging to precisely characteristic B and not A is given as: An unbiased estimator of the population total of people belonging to the two sensitive characteristics in a target population, N, is given as: It then follows that the variances of the estimate of the population parameters are given by: where K = N n and P > 0.

Efficiency comparison
It has been shown by several authors that to ensure reliable result in practical survey and increase confidentiality of the respondent, the probabilities of protection should be greater than 0.5 and not too high. Hence, we set P = 0.7 and T = 0.7 for the purpose of comparing the proposed model with the Lee et al. (2013).

Efficiency comparison with simple model
Here, we compare the proposed model with the simple model by obtaining the percentage relative efficiency of the proposed estimators with respect to the simple model estimators using Equations (40)-(42):

Efficiency comparison with crossed model
Comparing the proposed model with the crossed model using the percentage relative efficiency of the proposed estimators with respect to the crossed model estimators with the use of Equations (43)-(45), we obtained the results shown in Tables 3 and 4.

Results and discussion
The proposed model is more efficient than the simple model of Lee et al. (2013) at all levels. The percentage relative efficiency (PRE) of ̂A and ̂B ranges between 597.61 and 1,223.01 each while the PRE of ̂A B ranges between 706.23 and 824.68. Also, the PRE of ̂A and ̂B increases with increase in π A and π B while PRE of ̂A B decreases (see Table 1). Table 2 shows that at all levels of π AB , the percentage relative efficiency (PRE) decreases with increase in the values of π A and π B . When π AB = 0.05; the percentage relative efficiency (PRE) ranges between 706.23 and 824.68, when π AB = 0.1; PRE ranges between 701.58 and 812.81 while when π AB = 0.2; PRE ranges between 713.55 and 831.59. However, as the value of π AB increases there is increase in the PRE.
(40) The proposed model is more efficient than the crossed model of Lee et al. (2013) when considering the two sensitive characteristics independently while the interaction show that the proposed is only efficient when π A + π B ≤ 0.7 . The percentage relative efficiency (PRE) of ̂A and ̂B ranges between 103.93 and 332.53 each while the PRE of ̂A B ranges between 108.78 and 207.48 as seen in Table 3. However, the PRE of ̂A and ̂B decreases with increase in π A and π B while PRE of ̂A B also decreases.
In Table 4; when π AB = 0.05 and π AB = 0.1, the proposed model becomes inefficient against the crossed model as π A + π B > 0.7 while when π AB = 0.2, the proposed model becomes more efficient than the crossed model at all levels of π A and π B . Therefore, as the values of π AB increases there is observed increase in the PRE.

Conclusion
It has been shown that the proposed model is more efficient than the simple model of Lee et al. (2013) at all levels. The proposed model is also more efficient than the crossed model of Lee et al. (2013) when considering the two sensitive characteristics independently while the interaction show that the proposed is efficient when π A + π B ≤ 0.7. This is because as the sum increases to one less interaction occurs but as π AB increases to 0.2, it becomes more efficient at all levels. It was also observed that there is system breakdown as P = 0.5 or T = 0.5 and P + T = 1, in simple and crossed models, respectively, while the proposed model is estimable at all point greater than zero. Therefore, it is concluded that the proposed model and estimators are more robust and efficient than the conventional estimators.