The Determination of Sample Size for Sensitive Issue Successive Survey with the Multiplication RRT Model under Two-Stage and Stratified Two-Stage Random Sampling and Its Application in Medical Survey

When designing the sample scheme, it is important to determine the sample size. The survey accuracy and cost of survey and sampling method should be considered comprehensively. In this article, we discuss the method of determining the sample size of complex successive sampling with rotation sample for sensitive issue and deduce the formulas for the optimal sample size under two-stage sampling and stratified two-stage sampling by using Cauchy-Schwartzinequality, respectively, so as to minimize the cost for given sampling errors and to minimize the sampling errors for given cost.


Introduction
Sampling, a kind of incomplete survey, is the most common pattern of investigation [1,2]. Based on the sample taken from the population [3], we obtain the estimation of population parameter. Because of privacy and variability, in the process of sampling investigation for sensitive questions, some respondents often refuse to answer or give wrong answer for self-protection [4]. Thus it is difficult to get effective data by using the conventional method, and survey results cannot exactly reflect the true population characteristics. Therefore multiplication model of Randomized Response Technique (RRT) is used to improve the response rate of respondents, so as to get the more realistic and reliable results [5]. Geng used the RRT model to survey the behavioral risk profile of men who have sex with men in Beijing, China [6]. Generally speaking, there are two disadvantages of sample fatigue and the decrease of the sample representativeness in successive survey. But sample rotation can greatly improve the accuracy of estimators [7]. Yu developed two complex successive sampling methods with rotation sample for sensitive issue under two-stage sampling and stratified two-stage sampling, respectively [8].
The determination of sample size is a significant part of sampling design and is the necessary premise for the implementation of sampling. Nouri found a method of sampling and sample size determination of a comprehensive integrated community-based interventional [9]. However, there is no universal solution and a perfect prescription about the determination of sample size. Wang and Gao deduced formulae for the optimum sample size for twostage sampling [10]. For the empirical judgment of sample size, Chen obtained a method to determine the sample size and used it in the testability verification experiment of fault injection [10]. Jin and Yu deduced the formulae for the optimum sample sizes for Randomized Response Technique (RRT) model in stratified two-stage sampling [11].
But there are very few researches about the determination of sample size for the successive survey with sensitive questions. Yu and Jin gave the estimator of sample size for successive survey with partial clusters rotation under the given cost [12]. However, the sample size formulas associated with two complex successive sampling methods mentioned are not yet available. In this paper we deduce the formulas for the optimal sample size of two sampling methods by using Cauchy-Schwartz inequality, respectively, so as to minimize 2 Mathematical Problems in Engineering the cost for given sampling errors and to minimize the sampling errors for given cost [8].
The remainder of this paper is as follows. In Section 2, we discuss the method of determining the sample size of complex successive sampling with rotation sample for sensitive issue under multiplications RRT model and deduce the formulas for the optimal sample size under two-stage sampling and stratified two-stage sampling by using Cauchy-Schwartz inequality, respectively. In Section 3, we give four examples. Using the deduced formals in this paper, we investigated the number of sexual services per sex girl each month and the sex girls' age of first sexual surveys in Xichang City and the proportion of using condoms during anal sex and gay men and male behavior of monthly average value in MSM(men who have sex with men) in Beijing City. Section 4 is a summary of this article. Finally all technical details are put in the Appendix.  [13], a box contains ten balls which are printed 0, 1, 2. . . 9, respectively. The respondents randomly select a small ball from the box and fill in the questionnaire by multiplying the value of their own quantitative sensitive character to the value of the selected ball.

The Formula Derivation of Sample Size
. . . Two-Stage Sampling. Assume that the population of primary units and the i-th primary units are composed of secondary units ( = 1, 2, . . . , ). On average, each primary unit includes secondary units. In the first stage, primary units were drawn from the primary unit by using the simple random sampling. Let the sampling fraction of the first stage 1 be 1 = / . At the second stage, secondary units were drawn from the i-th chosen primary unit ( = 1, 2, . . . , ). On average second-stage units were drawn from each selected primary unit. Let the sampling fraction of the second stage 2 be 2 = / . At the second stage, the successive survey with sample rotation was, respectively, carried out in each selected primary units. At the i-th chosen primary unit ( = 1, 2, . . . , ), for the first survey, multiplication RRT Model was applied to investigate respondents in second-stage unit. At the h-th survey, ℎ secondary units are reserved randomly from the chosen primary units of h-1-th survey, and ℎ ( ℎ = − ℎ ) secondary units, the rotated part, are drawn from the rest − in the i-th primary unit, which were not chosen in the h-1-th survey. Then the multiplication RRT Model is used to investigate the reserved ℎ secondary units and rotated ℎ secondary units. In the second stage of sampling, simple random sampling under sample rotation is used for the secondary units in the selected primary unit in the first stage.
The estimator of the population mean in i-th primary unit in the h-th surveŷℎ iŝ Based on simple random sampling in the first stage sampling, according to the essential feature of the mean, the estimator of the population mean in the h-th surveŷℎ iŝ Suppose the quantitative characteristic of the sensitive problem of the respondents is , the extracted random variable is , and the product of and is . The population mean of and is and , respectively. The mean of all random numbers printed on the every ball in the box is .
According to the essential feature of the mean, we have wherêℎ is the sample mean of answer value for reserved secondary units of the i-th primary unit in the h-th survey. By (3), we get wherêℎ −1, is the sample mean of answer value for reserved secondary units of the i-th primary unit in the h-1-th survey Mathematical Problems in Engineering 3 According to the essential feature of the mean, ℎ is ℎ =̂ℎ (6) wherêℎ is the sample mean of answer value for rotated secondary units of the i-th primary unit in the h-1-th survey.
. . . e Estimator Variance of Population Mean. The variance of mean estimator in two-stage sampling is where 2 1ℎ is the variance of mean for the secondary units among primary units. Based on the simple random sampling in the first stage, by Cochran, W.G. [14], where ℎ is the population mean of the sensitive characters in the h-th survey and ℎ is population mean of the sensitive characters of the i-th primary unit in the h-th survey.
From (3), we get By (8), (9), and (10), we have where ℎ is the population mean of the answer value in the h-th survey and ℎ is the population mean for the answer value of the i-th primarily unit in the h-th survey.
Thus, we obtain the sample estimator 2 wherêℎ is the sample mean of the answer value in the h-th survey and̂ℎ is the sample mean for the answer value of the i-th primary unit in the h-th survey. Moreover, the 2 2ℎ is the variance of secondary units in the primary units.
where (̂ℎ ) is the variance of estimator for population mean in the i-th primary unit of the h-th survey.
In the second stage of sampling, simple random sampling under sample rotation was used to investigate the secondary units of the primary units in the first stage. (̂ℎ ) the variance of estimator for population mean in the i-th primary unit of the h-th survey is where ℎ is the estimator of correlation coefficient of the answer value for the i-th primary unit between the h-th and the h-1th survey.
The sample estimator̂(̂ℎ ) of (̂ℎ ) iŝ where 2 ℎ is the variance of the answer value for rotated sample in the i-th primary unit of the h-th survey, 2 ℎ is the variance of the answer value for reserved sample in the i-th primary unit of the h-th survey, and 2 ℎ is the sample variance of the answer value for the whole sample in the i-th primary unit of the h-th survey The sample estimator 2 2ℎ of 2 2ℎ is So, the sample estimator V( ℎ ) of ( ℎ ) is .

. . Optimal Weight and Optimal Sample Rotation Rate.
Based on simple random sampling in the first stage and simple random rotation sampling in each primary unit, thus, Φ ℎ the optimal weights of the i-th primary unit in the h-th survey is Mathematical Problems in Engineering So, we get the rate of sample rotation for the i-th primary unit in the h-th survey . . . e Determination of Sample Size. In practice, the cost of survey often has the following simply function, by Cochran, W.G. [14]: where 0 is the fixed cost which is irrelevant to the sample size, such as the cost in leasing premises, hiring employees and publicizing the investigation, 1 is the average charge in investigating the each primary unit, and 2 is the average charge in investigating the each secondary unit. The sample size of the first stage is , and the sample size of the second stage is . From (7), we get From (20), we have Using the Cauchy-Schwartz inequality, from (22) and (23), we get the product If and only if From (25), we get samples in the second stage (23) and (25), we get (the coefficient for fixed cost of survey) From (25) and (27), we get (the optimal sample size for the fixed cost of survey) (22) and (27), we get (the coefficient for the given variance) From (22) and (29), we get (the optimal sample size for the given variance) where 1 = √ 2 1ℎ − 2 2ℎ / and 2 = 2ℎ .
. In the second stage, rotation sampling under simple random sampling is used for the secondary units from the selected primary unit in each stratum.
The estimator of the population mean in stratum of the h-th survey iŝ Suppose the quantitative characteristic of the sensitive problem of the respondents is , the random variable extracted is , and is the product of and . and are the population mean of and , respectively.
is the mean of all random numbers printed on each ball in the box.
According to the basic properties of the mean = (32) By (32), we get wherêℎ is sample mean of the answer value of the reserved sample from the i-th primary unit in stratum of the h-th survey.
From (32), we get wherêℎ −1, is sample mean of the answer value of the reserved sample from the i-th primary unit in stratum of the h-1-th survey.
By (32), we get wherêℎ is sample mean of the answer value of the rotated sample from the i-th primary unit in stratum of the h-th survey.

( ) Estimators of the Population Mean.
According to the basic properties of the mean, the estimator of population mean in the h-th survey ℎ iŝ where = / .
( ) e Estimator Variance of Population Mean. Two stage successive sampling with rotation of secondary units is used in each stratum. By (17), the estimator variance ( ℎ ) of population mean in stratum of h-th survey is where 1 = / , 2 = / , 2 ℎ is the variance of answer value for the rotated sample from the i-th primary unit in stratum of the h-th survey, 2 ℎ is the variance of answer value for the reserved sample from the i-th primary unit in stratum of the h-th survey, 2 ℎ is the sample variance of the answer value for the whole sample from the i-th primary unit in stratum of the h-th survey, and ℎ is the estimator of correlation coefficient of answer value from the i-th primary unit in stratum between the h-th and h-1th survey.
According to the basic properties of variance, the estimator variance of population mean ( ℎ ) in h-th survey is 6

Mathematical Problems in Engineering
The sample estimator V(̂ℎ) of (̂ℎ) is wherêℎ is the sample mean of the answer value in stratum of the h-th survey and̂ℎ is the sample mean for the answer value from the i-th primary unit in stratum of the h-th survey ( ) Optimal Weight and Optimal Sample Rotation Rate. Based on simple random sampling of the first stage in each stratum and simple random rotation sampling in each primary unit of each stratum, thus, Φ ℎ the optimal weights from the i-th primary unit in stratum of the h-th survey is So, we get the rate of sample rotation from the i-th primary unit in stratum of the h-th survey . . . Sample Size Determination. In practice, the cost of survey often has the following simply function, by Cochran, W.G. [14]: where 0 is the fixed cost which is irrelevant to the sample size, such as the cost of leasing premises, hiring employees, and publicizing the investigation, 1 is the average cost of investigating the each primary unit, 2 is the average cost of investigating the each secondary unit, is the sample size of the first stage in each stratum, and is the sample size of the second stage. From (39), we get Using the Cauchy-Schwartz inequality, from (49) and (50), we get the product From (52), we get in the second stage: Mathematical Problems in Engineering 7 From (41) and (51), we get (the optimal sample size for the fixed cost of survey) 2ℎ . From (45) and (53), we get (the optimal sample size for the given variance)

Applications
. . Applications of Two Stage Sampling . . . An Application in Xichang City. In 2013, two-stage sampling was employed to estimate the number of sexual services performed sex girl of each month in Xichang City. Define the streets as the primary unit and sex girls as the secondary unit. According to relative references [15], the permitted errors were taken as half of the confidence interval ( = 0.0005), so the confidence is 1 − = 0.95 and we get the given variance (̂ℎ) = 9.6. And Xichang City has 54 streets (N=54), on average, each street has 126 sex girls ( = 126). We also budget the survey cost of each street ( 1 = 1500 dollars) each person ( 2 = 15 dollars) and fixed cost( 0 = 2500 dollars).
(1) According to the results of investigation materials in Xichang City in 2011 that had be got before, we could compute estimators of relevant values and from (11)  (2) From (26), we could get the average size of sex girls that need to be investigated from each chosen street (3) Supposing that the cost is fixed ( = 40000 dollars), from (28), we could get the size of streets that need to be investigated from the all streets in Xichang City.
(4) Supposing that the variance is fixed ( (̂ℎ) = 9.6), from (30), we could get the size of streets that need to be investigated from the all streets in Xichang City .
. . An Application in Beijing City. In 2015, two-stage sampling was employed to estimate the proportion of using condoms during anal sexin in Beijng City. Define the districts as the primary unit and the MSM (men who have sex with men) as the secondary unit. According to relative references [16], we took the permitted errors as half of the confidence interval ( = 0.0005), so the confidence is 1 − = 0.95. And we get the given variance (ĥ) = 0.00033, and Beijing City has 16 districts ( = 16), on average; each district has 4234 MSM ( = 117). We also budget the survey cost of each district ( 1 = 14678 dollars) each person ( 2 = 0.45 dollars) and fixed cost ( 0 = 12368 dollars).
(1) According to the results of investigation materials in Beijing City in 2010 that had be got before, we could compute estimators of relevant values and from (11) and (13)  (2) From (26), we could get the average size of MSM that need to be investigated from each chosen district (3) Supposing that the cost is fixed ( = 147680 dollars), from (28), we could get the size of districts that need to be investigated from the all districts in Beijing City.
(4) Supposing that the variance is fixed ( (ĥ) = 0.00033), from (30), we could get the size of districts that need to be investigated from the all districts in Beijing City.
. . Applications of Stratified Two-Stage Random Sampling .
. . An Application in Xichang City. In 2015, stratified twostage random sampling was employed to estimate the sex girls' age of first sexual service in Xichang City. According to the age of the sex girls, the sex girls are divided into two stratums ( = 2), in which the ages of sex girls in the first stratum and the second stratum are from 15 to 29 and from 30 to 49, respectively. Define the streets as the primary unit and sex girls as the secondary unit. According to relative references [15], the permitted errors were taken as half of the confidence interval ( = 0.0005), so the confidence is 1 − = 0.95 and we get the given variance (̂ℎ) = 1.95 and 1 = 2 = 1/2. On average, each stratum has 27 streets ( = 27), and each street has 126 sex girls averagely ( = 126). At the first stage, streets were drawn. At the second stage, sex girls were selected from each chosen street. The foundational cost of survey 0 is 2200 dollars, the average charge 1 in investigating each street is 1750 dollars, and the average charge 2 in investigating each sex girl is 17 dollars.
(1) According to the results of survey that had be got before, we get (2) From (53), we could get the average size of sex girls that need to be sampled from each chosen street (3) Supposing that the cost is fixed ( = 45000 dollars), from (54), we could get the size of streets that need to be sampled from each stratum in Xichang City (4) Supposing that the variance is fixed ( (̂ℎ) = 1.95), from (55), we could get the size of streets that need to be sampled from each stratum in Xichang City .
. . An Application in Beijing City. In 2015, stratified twostage random sampling was employed to estimate gay men and male behavior of monthly average value in Beijing City. According to the age of the MSM, the MSM are divided into two stratums ( = 2), in which the age of MSM in the first stratum and the second stratum are from 15 to 29 and from 30 to 49, respectively. Define the entertainment venues (such as gay bar and gay club) as the primary unit and the MSM as the secondary unit. According to relative references [8], we took the permitted errors as half of the confidence interval ( = 0.0005), so the confidence is 1− = 0.95 we get the given variance (ĥ) = 0.385 and 1 = 0.5824 and 2 = 0.4176. Beijing has 3984 entertainment venues ( = 1992), and each entertainment venue has 17 MSM averagely ( = 17). At the first stage, entertainment venues were drawn. At the second stage, MSM were selected from the each chosen entertainment venues. The foundational cost of survey 0 is 4500 dollars, the average charge 1 in investigating each entertainment venues is 270 dollars, and the average charge 2 in investigating each MSM is 32 dollars.
(1) According to the results of survey that had been got before, we get (2) From (53), we could get the average size of MSM that need to be sampled from each chosen entertainment venue (3) Supposing that the cost is fixed ( = 90000 dollars), from (54), we could get the size of entertainment venues that need to be sampled from each stratum in Beijing City (4) Supposing that the variance is fixed ( (ĥ) = 0.385), from (55), we could get the size of entertainment venues that need to be sampled from each stratum in Beijing City

Discussion
(1) The formulas for the optimum sample sizes with rotation sample under the two-stage random sampling and stratified two-stage random sampling for sensitive questions are deduced for the first time in this paper. Because of the feature of sensitive questions, we adopt the multiplications RRT model to obtain the realistic and reliable data. Also, sample fatigue and the decrease of the sample representativeness are two disadvantages in successive survey. But sample rotation can greatly improve the accuracy of estimators. So, we apply sample rotation to balance the above contradictions. Using the formulae deduced in this paper, optimum sample size in each stage for investigating the number of monthly services and the first survey age of sex girls in Xichang City and the number of the proportion of using condoms during anal sex and gay men and male behavior of monthly average value of MSM in Beijing City are gotten.
(2) The survey method and statistical formulas of this paper have been successfully applied to survey and analyzed the sensitive issues of the sex service girls in Xichang city, Sichuan province. It indicates that the formulas have achieved good effect in practical application. The random response technology was adopted for the interviewees, and the multiplication RRT model was combined to improve the response rate of the interviewees and made the survey results more authentic and reliable. The result that is calculated based on our formulas provides scientific basis for health authority to make regional policies and decisions for effectively controlling HIV/AIDS.
(3) The RRT model has huge advantages although the limitations should not be overlooked. It works by adding random noise to the data, which may cause errors. However, RRT still is a good model in protecting sensitive personal information for sensitive issue survey. The RRT model is more likely to get the correct data than direct question designs when investigate some sensitive issues, for instance, premarital sex, premarital pregnancy, and extramarital sex. But some responders provide untruthful answers, which make negatively affect the accuracy of the data. Also, the RRT model needs to use larger samples than direct question designs. From this respect, it is necessary that the investigators should be familiar with the principle and operation of RRT model and obtain the trust of responders to protect privacy and improve reliability and validity. Moreover, the no-randomized response model behaves better than RRT model in aspect of efficiency and privacy protection, which will be the next research direction. RRT model adding random noise to the data for guarding privacy results in inaccurate results and inefficiency. Due to the RRT model having some limitations, it is very significant to get the formulas for the optimal sample size when the variance is given or the cost is fixed in this article.