Skip to main content
Log in

Designing randomized response surveys to support honest answers to stigmatizing questions

  • Original Paper
  • Published:
Review of Economic Design Aims and scope Submit manuscript

Abstract

Randomized response survey methods use noise to mask respondents’ answers to stigmatizing questions in an attempt to elicit honest responses. Respondents weigh the preference for honesty against the disutility of stigmatization when deciding how to answer. Since the disutility of stigmatization depends on the degree of noise, the interviewer designs the survey to balance two goals: (i) honest reporting by respondents and (ii) maximization of the accuracy of estimates based on the survey. We fully characterize the non-linear set of design parameters that lead to truth-telling, as well as the interviewer’s equilibrium survey design.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. While the exact notion of a stigmatizing question depends on the social and cultural environment, modern examples include, “Can you read,” “Do you use cocaine,” and “Are you a cheater?”

  2. Recent examples include: Karlan and Zinman (2012) on micro-finance; Martinez-Sanchez and Rosa-Garcia (2012) on classroom cheating; Musch et al. (2001) on tax evasion; and Solomon et al. (2007) on illegal resource use. These recent studies join a classical literature which includes: Goodstadt and Gruson (1975) on drug use; Wimbush and Dalton (1997) on employee theft; and many more.

  3. Chaudhuri and Mukerjee (2020) provide a comprehensive review of the statistics literature on RRT and its generalizations. Additionally, Rueda et al. (2016) and Tourangeau and Tourangeau and Yan (2007) provide related discussions about the computational and psychological elements of RRT and its generalizations.

  4. Flannery (2018) further shows that classical privacy measures may not fully reflect respondents’ relative incentives for honesty and dishonesty. He does this by creating a new randomized response method with the same amount of privacy as previous techniques but different incentives for honesty versus dishonesty.

  5. There is a parallel between the RRT and communication games since noise in these games can improve the equilibrium flow of information, much like noise in the RRT enables honest responses; see Blume et al. (2007, 2019) for details.

  6. As a consequence of their foci, the prior literatures view the randomization probabilities of the RRT as the only way to motivate respondents’ honesty. In light of this, our third contribution is the insight that the sample size also motivates honesty.

  7. In a related experiment, John et al. (2018) show that a variant of RRT – termed “forced response”—performs poorly. Under their variant of forced response, it is readily verified that honesty leads to a posterior belief for the interviewer where respondents who answer no are assessed to be non-stigmatized with certainty. Non-stigmatized respondents thus never answer yes for a wide range of preferences and so an honest equilibrium fails to exist; details are available upon request.

  8. Generalizations and extensions of our simple game are developed in Sect. 6.

  9. It is without loss to take \(p\ge \frac{1}{2}\) since \(p<\frac{1}{2}\) merely reverses the roles of y and n responses in our analysis.

  10. For expositional simplicity, respondents do not update their priors about v based on their types; we relax this assumption in Sect. 6.

  11. A rich literature in experimental economics documents a robust preference for honesty—see, for instance (Gneezy 2005; Lopez-Perez and Spiegelman 2013; Sanchez-Pages and Vorsatz 2009).

  12. As is standard in the literature on RRT (e.g., Blume et al. 2019), the respondent’s payoff depends on the interviewer’s posterior belief rather than the respondent’s second-order beliefs about the interviewer. This modeling approach anticipates the equilibrium consistency of all such beliefs. See Battigalli and Dufwenberg (2009) for additional details on the theory of dynamic psychological games.

  13. It is readily verified that \(\pi (\varvec{r})>0\) for each \(\varvec{r}\in \{y,n\}^{N}\) when respondents are honest.

  14. Since there is only one respondent, we suppress the dependency of the respondent’s payoff on the total number of respondents and the vector of co-respondent’s strategies.

  15. All proofs are provided in Appendix A.

  16. Given a fixed ratio of success to failures, a Bayesian posterior collapses in the number of independently and identically distributed signals; see, for instance, Mood et al. (1974).

  17. This property generalizes Proposition 2 of Blume et al. (2019), which established that a(1, p) is strictly increasing in p.

  18. We imagine, for instance, either (i) that the interviewer’s employer takes an action based on \(\hat{v}(\varvec{r})\) that reveals v or (ii) that the stigma around being an s-type fades over time and so v is revealed, enabling mean square compensation.

  19. Strictly speaking, the second restriction implies the first, but we treat them separately so that we may generalize the solution concept in Sect. 6.

  20. For expositional simplicity, \(\xi \) is normalized to 1 in light of Lemma 1.

  21. It is easy to show that \(\mathcal{C}\) is non-empty and compact. Non-emptiness follows from the facts that \(c<B\) and \(a(N,\frac{1}{2})=0\), so \((1,\frac{1}{2})\in \mathcal{C}\) since \(\lambda >0\). Since a(Np) is continuous in p per Lemma 2, we have

    $$\begin{aligned} \mathcal{C}_{n}=\{(n,p)\in \{n\}\times [\frac{1}{2},1]|a(n,p)\le \lambda \} \end{aligned}$$

    is compact for each \(n\in \{1,2,\ldots ,\lfloor \frac{B}{c}\rfloor \}\). Thus, \(\mathcal{C}=\cup _{N=1}^{\lfloor \frac{B}{c}\rfloor }\mathcal{C}_{n}\) is compact since \(\lfloor \frac{B}{c}\rfloor <\infty \).

  22. It is straightforward to show that \(u_{I}(N,p,\varvec{\sigma }^{\star },\hat{v})\) is continuous in (Np) on \(\mathcal{C}(\lambda )\). For \((N,p)\in \mathcal{C}\), respondent i answers honestly under \(\varvec{\sigma }^{\star }\) and so

    $$\begin{aligned} \pi (r_{i}|v)={\left\{ \begin{array}{ll} pv+(1-p)(1-v) &{} \text {if }r_{i}=y\\ 1-(pv+(1-p)(1-v)) &{} \text {if }r_{i}=n, \end{array}\right. } \end{aligned}$$

    is continuous in p. Consequently,

    $$\begin{aligned} \pi (\varvec{r}|v)=\frac{\prod _{i=1}^{N}\pi (r_{i}|v)}{\int _{0}^{1}\prod _{i=1}^{N}\pi (r_{i}|v)dv}\text { and }u_{i}(N,p,\varvec{\sigma }^{\star },\hat{v})=\int _{0}^{1}\sum _{\varvec{r}\in \{y,n\}^{N}}-(\hat{v}(\varvec{r})-v)^{2}\pi (\varvec{r}|v)dv \end{aligned}$$

    are both continuous in both p and N. It follows that

    $$\begin{aligned} \mu ^{\star }=\frac{\pi (\varvec{r}|\varvec{\theta },v)\Pi _{i=1}^{N}v^{\mathbb {I}(\theta _{i}=s)}(1-v)^{\mathbb {I}(\theta _{i}=t)}}{\pi (\varvec{r})} \end{aligned}$$

    is also continuous in both p and N.

  23. While the construction of \((\varvec{\sigma }^{\star },N^{\star },p^{\star },\mu ^{\star })\) involves respondent dishonesty when (Np) is such that \(a(N,p)>\lambda \), it only does so to fulfill the technical requirements of a perfect Bayesian equilibrium. This requirement is that posterior beliefs and respondents’ strategies be defined for all (Np) and be sequentially rational. In actuality, such values of (Np) with \(a(N,p)>\lambda \) cannot be played because of the PH requirement. Thus, lying is a moot, technical concern.

  24. For expositional simplicity, we ignore differences in equilibria that emerge in sub-games after the interviewer plays a strategy (Np) not meeting the PH requirement.

  25. When \(p=1\) an honest type s (t) always answers y (n), but as p decreases his answer becomes increasingly random.

  26. The practical implementation of the optimal survey design requires the instructor know \(\lambda \). If these are unknown, they may be estimated via survey techniques based around contingent valuation (e.g., Hanemann (1991) and Blumenschein et al. (2008)) or via laboratory experiments.

  27. Instead of garbling answers, methods like those of Greenberg et al. (1969) incentivize honesty through aggregation: respondents report the total number of yes and no answers to a battery of stigmatizing and non-stigmatizing questions.

  28. For expositional simplicity, we suppress references to \(f_{v}\) unless required.

  29. Mood et al. (1974) provide a detailed discussion on derived probability distributions.

  30. For expositional simplicity, we abuse notation and treat \(\mu \) as a measure over \((\varvec{\theta },v,\varvec{r})\) with joint probability \(\mu (\varvec{\theta },v|\varvec{r})\pi (\varvec{r})\), where \(\mu (\varvec{\theta },v|\varvec{r})\) is given by Eq. (11).

  31. To build intuition, the RRT generates Nr responses of y and \(N(1-r)\) responses of n given \(Z=Nr\). Thus, as N grows large, the interviewer’s posterior belief of the probability of a y response—i.e., his conditional expectation of the value of \(\tau \)—collapses to \(Nr/N=r\) by the full support assumption of \(f_{v}\). Since \(\lim _{N\rightarrow \infty }\mathbb {E}(\tau |Nr)\) is finite by Doob’s Martingale Convergence Theorem, it follows that \(\mathbb {E}(\tau |Nr)=r\). While it is true that \(1-p\le \mathbb {E}(\tau |Z)\le p\) due to the bounds of integration, as N grows large the distribution of R collapses to the specified value of \(\tau \), which is contained in \([1-p,p]\), and so the proceeding argument goes through. Miller (2016) overviews Doob’s Theorem and provides a detailed development of the consistency of the Bayesian estimator in the related context of random samples.

References

  • Battigalli P, Dufwenberg M (2009) Dynamic psychological games. J Econ Theory 144(1):1–35

    Article  Google Scholar 

  • Blume A, Board O, Kawamura K (2007) Noisy talk. Theor Econ 2(4):395–440

    Google Scholar 

  • Blume A, Lai E, Lim W (2019) Eliciting private information with noise: the case of randomized response. Games and Economic Behavior 113(356–380)

  • Blumenschein K, Blomquist G, Johannesson M, Horn N, Freeman P (2008) Eliciting willingness to pay without bias: evidence from a field experiment. Econ J 118(525):114–137

    Article  Google Scholar 

  • Boruch R (1971) Assuring confidentiality of responses in social research: a note on strategies. Am Sociol 6(4):308–311

    Google Scholar 

  • Chaudhuri A, Mukerjee R (2020) Randomized response: theory and techniques. CRC Press, Boca Raton

    Book  Google Scholar 

  • Flannery T (2018) A new methodology for surveys and its application to forced response. Math Soc Sci 91:17–24

    Article  Google Scholar 

  • Gneezy U (2005) Deception: the role of consequences. Am Econ Rev 95(1):384–394

    Article  Google Scholar 

  • Goodstadt M, Gruson V (1975) The randomized response technique: a test on drug use. J Am Stat Assoc 70(352):814–818

    Article  Google Scholar 

  • Greenberg B, Abul-Ela A, Simmons W, Horvitz D (1969) The unrelated question randomized response model: theoretical framework. J Am Stat Assoc 64(326):520–539

    Article  Google Scholar 

  • Hanemann W (1991) Willingness to pay and willingness to accept: how much can they differ? Am Econ Rev 81(3):635–647

    Google Scholar 

  • John L, Loewenstein G, Acquisti A, Vosgerau J (2018) When and why randomized response techniques (fail to) elicit the truth. Organ Behav Hum Decis Process 148:101–123

    Article  Google Scholar 

  • Karlan D, Zinman J (2012) List randomization for sensitive behavior: an application for measuring use of loan proceeds. J Dev Econ 98(1):71–75

    Article  Google Scholar 

  • Leysieffer F, Warner S (1976) Respondent jeopardy and optimal designs in randomized response models. J Am Stat Assoc 71(355):649–656

    Article  Google Scholar 

  • Ljungqvist L (1993) A unified approach to measures of privacy in randomized response models: a utilitarian perspective. J Am Stat Assoc 88(421):97–103

    Google Scholar 

  • Lopez-Perez R, Spiegelman E (2013) Why do people tell the truth? experimental evidence for pure lie aversion. Exp Econ 16(3):233–247

    Article  Google Scholar 

  • Martinez-Sanchez F, Rosa-Garcia A (2012) Measuring the frequency of cheating among students using a randomized list technique. Universitat de Valencia 10:55–59

    Google Scholar 

  • Miller J (2016) Lecture notes on advanced stochastic modeling: consistency asymptotic normality, and coverage. Working Paper, Duke University

  • Mood A, Graybill F, Boes D (1974) Introduction to the theory of statistics, 3rd edn. McGraw-Hill, New York

    Google Scholar 

  • Moshagen M, Hilbig B, Erdfelder E, Moritz A (2014) An experimental validation method for questioning techniques that assess sensitive issues. Exp Psychol 61(1):48–54

    Article  Google Scholar 

  • Musch J, Broder A, Klauer K (2001) Improving survey research on the world-wide web using the randomized response technique. Dimensions of Internet Science, pp 179–192

  • Rueda M, Cobo B, Arcos A, Arnab R (2016) Chapter 10: software for randomized response techniques. Handbook Statist 34:155–167

    Article  Google Scholar 

  • Sanchez-Pages S, Vorsatz M (2009) Enjoy the silence: an experiment on truth-telling. Exp Econ 12(2):220–241

    Article  Google Scholar 

  • Solomon J, Jacobson S, Wald K, Gavin M (2007) Estimating illegal resource use at a Ugandan park with the randomized response technique. Hum Dimens Wildl 12(2):75–88

    Article  Google Scholar 

  • Tourangeau R, Yan T (2007) Sensitive questions in surveys. Psychol Bull 133(5):859–883

    Article  Google Scholar 

  • Umesh U, Peterson R (1991) A critical evaluation of the randomized response method applications, validation, and research agenda. Sociol Methods Res 20(1):104–138

    Article  Google Scholar 

  • Warner S (1965) Randomized response: a survey technique for eliminating evasive answer bias. J Am Stat Assoc 60(309):63–69

    Article  Google Scholar 

  • Wimbush J, Dalton D (1997) Base rate for employee theft: convergence of multiple methods. J Appl Psychol 82(5):756–763

    Article  Google Scholar 

  • Winkler R, Franklin L (1979) Warner’s randomized response model: a bayesian approach. J Am Stat Assoc 74(365):207–214

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Timothy J. Flannery.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A previous version of this work circulated under the title, “Analyzing Warner’s Response Technique using Uniform Priors and Linear Utility with Sufficient Conditions for General Priors and Non-linear Utility.” The authors are grateful for comments, discussion, and insights from Andreas Blume, Patrick Cizek, Stan Reynolds, Stephen Roberts, Mark Walker, Abbie Zhang, and several anonymous referees, as well as seminar participants at the University of Arizona. The authors have no conflicts of interest regarding this work.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (zip 389 KB)

A Appendix: Proofs

A Appendix: Proofs

This appendix collects the proofs.

Proof of Lemma 1. Obvious and omitted. \(\square \)

Proof of Lemma 2. Obvious and omitted. \(\square \)

Proof of Proposition 3. Follows directly from Proposition 11. \(\square \)

Proof of Proposition 4. Towards a contradiction, suppose the first claim is false. Then, there is a \(p_{0}\in (\frac{1}{2},1)\) such that \(a(N,p_{0})\ge a(N,p)\) for all \(p\ge p_{0}\). But, we have that (i) \(a(\infty ,p)\) is strictly increasing and onto and that (ii) \(a(N,p_{0})\le a(1,p_{0})=2p_{0}-1<1\) since \(p_{0}<1\). Thus, there is a \(\tilde{p}<1\) such that \(a(N,p_{0})=a(\infty ,\tilde{p})\). But then for \(p>\tilde{p}\), we have \(a(N,p_{0})<a(\infty ,p)\le a(N,p)\) by Proposition 3, an impossibility. \(\square \)

Proof of Corollary 5. Obvious and omitted. \(\square \)

Proof of Corollary 6. Obvious and omitted. \(\square \)

Proof of Lemma 7. Obvious and omitted. \(\square \)

Proof of Lemma 8. Obvious and omitted. \(\square \)

Proof of Proposition 9. Obvious and omitted. \(\square \)

Proof of Lemma 10. Obvious and omitted. \(\square \)

Proof of Proposition 11. We only establish part (i) of the proposition since the argument for part (ii) parallels the proof of Proposition 4. The proof is by induction. It is broken into four parts:

  • Part 0 describes notation as well as transformations and preliminary results on the random variables used in the proof.

  • Part 1 develops the lower bound on a(1, p) and show that \(a(1,p)<a(2,p)\) for \(p\in (\frac{1}{2},1)\).Footnote 28

  • Part 2 establishes that, when \(a(N,p)<a(N+1,p)\) for all \(p\in (\frac{1}{2},1)\), then \(a(k,p)<a(k+1,p)\) for all \(p\in (\frac{1}{2},1)\).

  • Part 3 gives the upper bound on \(a(\infty ,p)\).

Part 0: Our proof uses a transformation of a(Np), which allows us to establish the desired result by focusing on means and variances. Key to the proof is the variable \(\tau =pv+(1-p)(1-v)\). To build intuition, we rewrite the interviewer’s posterior belief under honesty. After he observes responses \(\varvec{r}\), then his Bayesian posterior regarding v is \(h(v|Y(\varvec{r}))\), where

$$\begin{aligned} h(v|Y)=\frac{\tau ^{Y}(1-\tau )^{N-Y}}{\int _{0}^{1}\tau ^{Y}(1-\tau )^{N-Y}f_{v}(v)dv}, \end{aligned}$$

because each honest response is an identically distributed and conditionally independent Bernoulli random variable given v. The conditional probability of success of this Bernoulli variable is \(\tau \). Hence, given responses \(\varvec{r}=(r_{1},\ldots ,r_{N})\), the interviewer’s posterior belief regarding \(\varvec{\theta }=(\theta _{1},\ldots ,\theta _{N})\) and v may be rewritten as

$$\begin{aligned} \mu (\varvec{\theta },v|\varvec{r})=h(v|Y(\varvec{r}))\tilde{g}(\varvec{\theta }|v,\varvec{r}), \end{aligned}$$

where \(\tilde{g}\) is the conditional probability of types \(\varvec{\theta }\) given (vr). But, given v, the probability that respondent i is type \(\theta _{i}\) only depends on his own response \(r_{i}\); it is independent of other’s responses. This is because (i) types are conditionally independent given v and (ii) every other respondent j’s answer only depends on j’s type and question. It follows that \(\tilde{g}(\varvec{\theta }|v,\varvec{r})=\prod _{i=1}^{N}g(\theta _{i}|v,r_{i})\), where

$$\begin{aligned} g(\theta _{i}|v,r_{i})&=\left( \frac{(pv)^{\mathbb {I}(\theta _{i}=s)}((1-p)(1-v))^{1-\mathbb {I}(\theta _{i}=s)}}{\tau }\right) ^{Y(r_{i})}\\&\quad \times \left( \frac{((1-p)v)^{\mathbb {I}(\theta _{i}=s)}(p(1-v))^{1-\mathbb {I}(\theta _{i}=s)}}{1-\tau }\right) ^{1-Y(r_{i})} \end{aligned}$$

is readily verified to be the probability that respondent i is type \(\theta _{i}\) given v and his response \(r_{i}\) under honesty. Hence, under honesty,

$$\begin{aligned} \mu (\varvec{\theta },v|\varvec{r})=h(v|Y(\varvec{r}))\prod _{i=1}^{N}g(\theta _{i}|v,r_{i}). \end{aligned}$$
(11)

Equation (11) implies that the interviewer’s belief regarding respondent i’s type \(\mu (\theta _{i}|\varvec{r})\) depends only on (i) i’s own response and (ii) the total number of y answers (including i’s response). Thus, a(Np) may be rewritten in terms of \(r_{i}\) and Y. As is evident from Eq. (11), this transformation leverages \(\tau \). Specifically, as we will see in the body of the proof, the re-expression depends on the mean and variance of \(\tau \) given Y, as well as the distribution of Y. We next derive (i) the conditional mean and variance of \(\tau \) and (ii) the distribution of Y. We provide details on the re-expression of a(Np) in the body of the proof.

Since \(\tau =pv+(1-p)(1-v)\) and since v has conditional distribution h(v|Y) on [0, 1], the distribution of \(\tau \) given Y is

$$\begin{aligned} f_{\tau }(\tau |Y)=\frac{1}{2p-1}h\big (\frac{\tau -(1-p)}{2p-1}|Y\big ) \end{aligned}$$

and that it has support is \([1-p,p]\).Footnote 29 It follows that the expectation of \(\tau \) given Y is \(\mathbb {E}(\tau |Y)=\int _{1-p}^{p}\tau f_{\tau }(\tau |Y)d\tau \) and that the variance of \(\tau \) given Y is \(\mathbb {V}(\tau |Y)=\mathbb {E}(\tau ^{2}|Y)-\mathbb {E}(\tau |Y)\), where \(\mathbb {E}(\tau ^{2}|Y)=\int _{1-p}^{p}\tau ^{2}f_{\tau }(\tau |Y)d\tau .\) Two observations will prove useful in the re-expression of a(Np). First, \(d\tau =(2p-1)dv\) and so \(h(\nu |Y)d\nu =f_{\tau }(\tau |Y)d\tau \). Second,

$$\begin{aligned} \mathbb {E}(\tau |Y)&=p\mathbb {E}(v|Y)+(1-p)(1-\mathbb {E}(v|Y))\\ \mathbb {V}(\tau |Y)&=(2p-1)^{2}\mathbb {V}(v|Y), \end{aligned}$$

where \(\mathbb {E}(v|Y)=\int _{0}^{1}vh(v|Y)dv\) and \(\mathbb {V}(v|Y)=\int _{0}^{1}(v-\mathbb {E}(v|Y))^{2}h(v|Y)dv\) are respectively the expected value and variance of v given Y. Analogously,

$$\begin{aligned} \mathbb {E}(\tau )&=p\mathbb {E}(v)+(1-p)(1-\mathbb {E}(v))\\ \mathbb {V}(\tau )&=(2p-1)^{2}\mathbb {V}(v), \end{aligned}$$

are respectively the expected value variance of \(\tau \), where \(\mathbb {E}(v)=\int _{0}^{1}vf_{v}(v)dv\) and \(\mathbb {V}(v)=\int _{0}^{1}(v-\mathbb {E}(v))^{2}f_{v}(v)dv\) are respectively the expected value and variance of v.

The random variable Y is binomially distributed, i.e., the probability of Y honest answers of y from N respondents is

$$\begin{aligned} \pi (Y)={N \atopwithdelims ()Y}\int _{0}^{1}\tau ^{Y}(1-\tau )^{1-Y}f_{v}(v)dv. \end{aligned}$$
(12)

Simply, given a response vector \(\varvec{r}\), there are \({N \atopwithdelims ()Y(\varvec{r})}-1\) other response vectors that have \(Y(\varvec{r})\) answers of y. Since the probability of each of these vectors is \(\int _{0}^{1}\tau ^{Y(\varvec{r})}(1-\tau )^{N-Y(\varvec{r})}f_{v}(v)dv\), Eq. (12) obtains.

Part 1: To show that \(a(1,p)<a(2,p)\) for all \(p\in (\frac{1}{2},1)\), with equality at one-half and one, we re-write a(1, p) and a(2, p). For \(a(1,p)=\mu (s|y)-\mu (s|n)\), it is readily verified thatFootnote 30:

$$\begin{aligned} \mu (s|y)&=\frac{\mu (s,y)}{\mu (y)}=\frac{\int _{0}^{1}\mu (s,y,\nu )d\nu }{\int _{0}^{1}\mu (y,\nu )d\nu }=\frac{\int _{0}^{1}p\nu f_{\nu }(\nu )d\nu }{\int _{0}^{1}[p\nu +(1-p)(1-\nu )]f_{\nu }(\nu )d\nu }\\&=\frac{p\mathbb {E}(\nu )}{p\mathbb {E}(\nu )+(1-p)(1-\mathbb {E}(\nu ))}\text { and }\\ \mu (s|n)&=\frac{\mu (s,n)}{\mu (n)}=\frac{\int _{0}^{1}\mu (s,n,\nu )d\nu }{\int _{0}^{1}\mu (n,\nu )d\nu }=\frac{\int _{0}^{1}(1-p)\nu f_{\nu }(\nu )d\nu }{\int _{0}^{1}[p(1-\nu )+(1-p)\nu ]f_{\nu }(\nu )d\nu }\\&=\frac{(1-p)\mathbb {E}(\nu )}{p\mathbb {E}(\nu )+(1-p)(1-\mathbb {E}(\nu ))}. \end{aligned}$$

Thus,

$$\begin{aligned} a(1,p)=\frac{p\mathbb {E}(\nu )}{p\mathbb {E}(\nu )+(1-p)(1-\mathbb {E}(\nu ))}-\frac{(1-p)\mathbb {E}(\nu )}{p\mathbb {E}(\nu )+(1-p)(1-\mathbb {E}(\nu ))}, \end{aligned}$$

which simplifies to \(a(1,p)=2p-1\) when \(\mathbb {E}(v)=\frac{1}{2}\), as in Proposition 3. Changing variables to \(\tau \) and simplifying gives

$$\begin{aligned} a(1,p)=\frac{1}{2p-1}\Big [1-p(1-p)\Big (\frac{1}{\mathbb {E}(\tau )}+\frac{1}{1-\mathbb {E}(\tau )}\Big )\Big ]. \end{aligned}$$

The display equation is readily verified to be increasing in p.

Regarding a(2, p), let \(\bar{r}\) denote the response of the other respondent. Then,

$$\begin{aligned} a(2,p)=\pi (\bar{y})(\mu (s|y,\bar{y})-\mu (s|n,\bar{y}))+\pi (\bar{n})(\mu (s|y,\bar{n})-\mu (s|n,\bar{n})), \end{aligned}$$

where \(\mu (s|r,\bar{r})\) denotes interviewer’s posterior belief when respondent i states r and the other respondent states \(\bar{r}\). It is readily verified that:

$$\begin{aligned} \mu (s|y,\bar{y})&=\frac{\mu (s,y,\bar{y})}{\mu (y,\bar{y})},=\frac{\int _{0}^{1}\mu (s,y,\bar{y},\nu )d\nu }{\int _{0}^{1}\mu (y,\bar{y},\nu )d\nu }\\&=\frac{\int _{0}^{1}\mu (s,y|\nu )\mu (\bar{y}|\nu )f_{\nu }(\nu )d\nu }{\int _{0}^{1}\mu (y|\nu )\mu (\bar{y}|\nu )f_{\nu }(\nu )d\nu }\\&=\frac{\int _{0}^{1}p\nu (p\nu +(1-p)(1-\nu ))f_{\nu }(\nu )d\nu }{\int _{0}^{1}[p\nu +(1-p)(1-\nu )]^{2}f_{\nu }(\nu )d\nu } \end{aligned}$$

because a respondent’s type is conditionally independent of other respondents’ answers given v. Analogous logic gives

$$\begin{aligned} \mu (s|n,\bar{y})&=\frac{\int _{0}^{1}(1-p)\nu (p\nu +(1-p)(1-\nu )f_{\nu }(\nu )d\nu }{\int _{0}^{1}[p(1-\nu )+(1-p)\nu ][(p\nu +(1-p)(1-\nu )]f_{\nu }(\nu )d\nu },\\ \mu (s|y,\bar{n})&=\frac{\int _{0}^{1}p\nu (p(1-\nu )+(1-p)\nu )f_{\nu }(\nu )d\nu }{\int _{0}^{1}[p(1-\nu )+(1-p)\nu ][p\nu +(1-p)(1-\nu )]f_{\nu }(\nu )d\nu },\text { and }\\ \mu (s|n,\bar{n})&=\frac{\int _{0}^{1}(1-p)\nu (p(1-\nu )+(1-p)\nu )f_{\nu }(\nu )d\nu }{\int _{0}^{1}[p(1-\nu )+(1-p)\nu ]^{2}f_{\nu }(\nu )d\nu }. \end{aligned}$$

Changing variables to \(\tau \) and simplifying gives

$$\begin{aligned} \mu (s|y,\bar{y})&=\frac{\int _{0}^{1}p\nu (p\nu +(1-p)(1-\nu ))f_{\nu }(\nu )d\nu }{\int _{0}^{1}[p\nu +(1-p)(1-\nu )]^{2}f_{\nu }(\nu )d\nu }\\&=\frac{\int _{1-p}^{p}p(\tau -(1-p))\tau f_{\tau }(\tau )d\tau }{\int _{1-p}^{p}(2p-1)\tau ^{2}f_{\tau }(\tau )d\tau }\\&=\frac{1}{2p-1}\Big [p-\frac{p(1-p)\mathbb {E}(\tau )}{\mathbb {E}(\tau ^{2})}\Big ]. \end{aligned}$$

Analogous reasoning gives

$$\begin{aligned} \mu (s|n,\bar{y})&=\frac{1}{2p-1}\Big [-(1-p)+\frac{p(1-p)\mathbb {E}(\tau )}{\mathbb {E}(\tau )-\mathbb {E}(\tau ^{2})}\Big ],\\ \mu (s|y,\bar{n})&=\frac{1}{2p-1}\Big [p-\frac{p(1-p)(1-\mathbb {E}(\tau ))}{\mathbb {E}(\tau )-\mathbb {E}(\tau ^{2})}\Big ],\text { and }\\ \mu (s|n,\bar{n})&=\frac{1}{2p-1}\Big [-(1-p)+\frac{p(1-p)(1-\mathbb {E}(\tau ))}{1-2\mathbb {E}(\tau )+\mathbb {E}(\tau ^{2})}\Big ]. \end{aligned}$$

Since \(\pi (y)=\mathbb {E}(\tau )\) and \(\pi (n)=1-\mathbb {E}(\tau )\), the above display equations imply that

$$\begin{aligned} a(2,p)= & {} \frac{1}{2p-1}\bigg [1-p(1-p)\bigg (\frac{\mathbb {E}(\tau )^{2}}{\mathbb {V}(\tau )+\mathbb {E}(\tau )^{2}}+\frac{\mathbb {E}(\tau )^{2}+(1-\mathbb {E}(\tau ))^{2}}{\mathbb {E}(\tau )(1-\mathbb {E}(\tau ))-\mathbb {V}(\tau )}\\{} & {} \qquad +\frac{(1-\mathbb {E}(\tau ))^{2}}{(1-\mathbb {E}(\tau ))^{2}+\mathbb {V}(\tau )}\bigg )\bigg ]. \end{aligned}$$

Thus, by algebraic simplification, \(a(1,p)\ge a(2,p)\) if and only if

$$\begin{aligned} \frac{1}{\mathbb {E}(\tau )}+\frac{1}{1-\mathbb {E}(\tau )}\le & {} \frac{\mathbb {E}(\tau )^{2}}{\mathbb {V}(\tau )+\mathbb {E}(\tau )^{2}}+\frac{\mathbb {E}(\tau )^{2}+(1-\mathbb {E}(\tau ))^{2}}{\mathbb {E}(\tau )(1-\mathbb {E}(\tau ))-\mathbb {V}(\tau )}\nonumber \\{} & {} \qquad +\frac{(1-\mathbb {E}(\tau ))^{2}}{(1-\mathbb {E}(\tau ))^{2}+\mathbb {V}(\tau )} \end{aligned}$$
(13)

when \(p<1\). (It is clear that \(a(1,p)=a(2,p)\) when \(p=1\) because \(p(1-p)=0\).) Yet, it is readily verified that

$$\begin{aligned} \frac{1}{E[\tau ]}+\frac{1}{1-E[\tau ]}=\frac{E[\tau ]^{2}}{E[\tau ]^{2}}+\frac{E[\tau ]^{2}+(1-E[\tau ])^{2}}{E[\tau ](1-E[\tau ])}+\frac{(1-E[\tau ])^{2}}{(1-E[\tau ])^{2}}. \end{aligned}$$

Thus, Eq. (13) holds with equality when \(\mathbb {V}(\tau )=0\). When \(\mathbb {V}(\tau )>0\), then algebra shows that Eq. (13) holds with strict inequality. Since \(\mathbb {V}(\tau )=(2p-1)^{2}\mathbb {V}(v)\) and since \(\mathbb {V}(v)>0\) by the full support assumption on \(f_{v}(v)\), we have that \(\mathbb {V}(\tau )=0\) if and only if \(p=\frac{1}{2}\). Consequently, \(a(1,p)>a(2,p)\) on \((\frac{1}{2},1)\), with equality at one-half and one.

Part 2: We show that \(a(k,p)<a(k+1,p)\) for all \(p\in (\frac{1}{2},1)\), with equality at one-half and one, provided the induction hypothesis holds. To these ends, write

$$\begin{aligned} a(k,p)=\sum _{Z=0}^{k-1}\pi (Z)\big (\mu (s|y,Z)-\mu (s|n,Z)\big ), \end{aligned}$$

where \(\mu (s|r_{i},Z)\) is the interviewer’s posterior belief after observing respondent i’s answer \(r_{i}\) and Z answers of y among the other \(k-1\) respondents. Further, we have that

$$\begin{aligned} \mu (s|y,Z)-\mu (s|n,Z)&=\frac{p\mathbb {E}(\nu |Z)}{p\mathbb {E}(\nu |Z)+(1-p)(1-\mathbb {E}(\nu |Z))}\\&\quad -\frac{(1-p)\mathbb {E}(\nu |Z)}{p(1-\mathbb {E}(\nu |Z))+(1-p)\mathbb {E}(\nu |Z)}\\&=\frac{1}{2p-1}\Big [1-p(1-p)\Big (\frac{1}{\mathbb {E}(\tau |Z)}+\frac{1}{1-\mathbb {E}(\tau |Z)}\Big )\Big ], \end{aligned}$$

where the first equality follows from reasoning akin to that used for a(1, p) and the second equality follows from changing variables to \(\tau \). Thus,

$$\begin{aligned} a(k,p)=\sum _{Z=0}^{k-1}\pi (Z)\Bigg (\frac{1}{2p-1}\Big [1-p(1-p)\Big (\frac{1}{\mathbb {E}(\tau |Z)}+\frac{1}{1-\mathbb {E}(\tau |Z)}\Bigg )\Big ]\bigg ). \end{aligned}$$

Regarding \(a(k+1,p)\), let \(\bar{r}\) denote the response of the \(k+1\)-st respondent. We have \(a(k+1,p)\) equal to the following:

$$\begin{aligned}{} & {} \sum _{Z=0}^{k-1}\big (\pi (Z,\bar{y})(\mu (s|y,\bar{y},Z)-\mu (s|n,\bar{y},Z))+\pi (Z,\bar{n})(\mu (s|y,\bar{n},Z)-\mu (s|n,\bar{n},Z))\big )\\{} & {} \quad =\sum _{Z=0}^{k-1}\bigg [\pi (Z,\bar{y})\bigg (\frac{1}{2p-1}\bigg [1-p(1-p)\bigg (\frac{\mathbb {E}(\tau |Z)}{\mathbb {V}(\tau |Z)+\mathbb {E}(\tau |Z)^{2}}+\frac{\mathbb {E}(\tau |Z)}{\mathbb {E}(\tau |Z)(1-\mathbb {E}(\tau |Z))-\mathbb {V}(\tau |Z)}\bigg )\bigg ]\bigg )\\{} & {} \quad +\pi (Z,\bar{n})\bigg (\frac{1}{2p-1}\bigg [1-p(1-p)\bigg (\frac{(1-\mathbb {E}(\tau |Z))}{\mathbb {E}(\tau |Z)(1-\mathbb {E}(\tau |Z))-\mathbb {V}(\tau |Z)}+\frac{(1-\mathbb {E}(\tau |Z))}{(1-\mathbb {E}(\tau |Z))^{2}+\mathbb {V}(\tau |Z)}\bigg )\bigg ]\bigg )\bigg ] \end{aligned}$$

by reasoning akin to that used for a(2, p), where \(\pi (Z,\bar{r})\) is the joint probability of Z answers of y among the first \(k-1\) other respondents and a response of \(\bar{r}\) by \(k+1\). Yet, \(\pi (Z,\bar{y})=\pi (Z)\pi (\bar{y}|Z)=\pi (Z)\mathbb {E}(\tau |Z)\) and \(\pi (Z,\bar{n})=\pi (Z)(1-\mathbb {E}(\tau |Z))\) so we have

$$\begin{aligned} a(k+1,p)&=\sum _{Y=0}^{k-1}\pi (Z)\bigg (\frac{1}{2p-1}\bigg [1-p(1-p)\bigg (\frac{\mathbb {E}(\tau |Z)^{2}}{\mathbb {V}(\tau |Z)+\mathbb {E}(\tau |Z)^{2}}\\&\quad +\frac{\mathbb {E}(\tau |Z)^{2}+(1-\mathbb {E}(\tau |Z))^{2}}{\mathbb {E}(\tau |Z)(1-\mathbb {E}(\tau |Y))-\mathbb {V}(\tau |Z)}+\frac{(1-\mathbb {E}(\tau |Z))^{2}}{(1-\mathbb {E}(\tau |Z))^{2}+\mathbb {V}(\tau |Z)}\bigg )\bigg ]\bigg ). \end{aligned}$$

Algebraic simplification gives that \(a(k,p)\ge a(k+1,p)\) when

$$\begin{aligned}{} & {} \frac{1}{\mathbb {E}(\tau |Z)}+\frac{1}{1-\mathbb {E}(\tau |Z)}\le \frac{\mathbb {E}(\tau |Z)^{2}}{\mathbb {V}(\tau |Z)+\mathbb {E}(\tau |Z)^{2}}+\frac{\mathbb {E}(\tau |Z)^{2}+(1-\mathbb {E}(\tau |Z))^{2}}{\mathbb {E}(\tau |Z)(1-\mathbb {E}(\tau |Y))-\mathbb {V}(\tau |Z)}\nonumber \\{} & {} \qquad \qquad +\frac{(1-\mathbb {E}(\tau |Z))^{2}}{(1-\mathbb {E}(\tau |Z))^{2}+\mathbb {V}(\tau |Z)} \end{aligned}$$
(14)

for each \(Z\in \{0,1,\ldots ,k-1\}\) when \(p<1\). (If \(p=1\), then \(a(k,p)=a(k+1,p)\) since \(p(1-p)=0\).) Yet, Eq. (14) is of the same form as Eq (14) and thus Eq. (14) holds with strict inequality whenever \(\mathbb {V}(\tau |Z)>0\). Since \(\mathbb {V}(\tau |Z)=(2p-1)^{2}\mathbb {V}(v|Z)\) and since \(\mathbb {V}(v|Z)>0\) by the full support assumption on \(f_{v}(v)\), we have that \(\mathbb {V}(\tau |Z)=0\) if and only if \(p=\frac{1}{2}\). Consequently, \(a(k,p)>a(k+1,p)\) on \((\frac{1}{2},1)\), with equality at one-half and one.

Part 3: We close by deriving \(a(\infty ,p)\). Since \(\pi (Y)=\int _{1-p}^{p}\pi (Y|\tau )f_{\tau }(\tau )d\tau \), where \(\pi (Y|\tau )\) is the conditional distribution of Y given \(\tau \), write

$$\begin{aligned} a(\infty ,p)=\frac{1}{2p-1}\bigg [1-p(1-p)\int _{1-p}^{p}f_{\tau }(\tau )\lim _{N\rightarrow \infty }\sum _{Z=0}^{N-1}\Big (\frac{\pi (Z|\tau )}{\mathbb {E}(\tau |Z)}+\frac{\pi (Z|\tau )}{1-\mathbb {E}(\tau |Z)}\Big )d\tau . \end{aligned}$$

We will establish that

$$\begin{aligned} \lim _{N\rightarrow \infty }\sum _{Z=0}^{N-1}\Big (\frac{1}{\mathbb {E}(\tau |Z)}\pi (Z|\tau )+\frac{1}{1-\mathbb {E}(\tau |Z)}\pi (Z|\tau )\Big )=\frac{1}{\tau }+\frac{1}{1-\tau }. \end{aligned}$$

Consequently,

$$\begin{aligned} a(\infty ,p)=\frac{1}{2p-1}\Big [1-p(1-p)\int _{1-p}^{p}\Big (\frac{1}{\tau }+\frac{1}{1-\tau }\Big )f_{\tau }(\tau )d\tau \Big ], \end{aligned}$$

which is (i) readily verified to be increasing in p (via Leibniz’s rule) and (ii) reduces to the specific form given in Proposition 4 when \(f_{v}=1\).

Consider

$$\begin{aligned} \lim _{N\rightarrow \infty }\sum _{Z=0}^{N-1}\frac{1}{\mathbb {E}(\tau |Z)}\pi (Z|\tau ), \end{aligned}$$

which is a conditional expectation of the random variable Z, which has binomial distribution and probability of success \(\tau \). The Normal Approximation to the Binomial thus gives \(\frac{Z}{N}\sim N(\tau ,\frac{\tau (1-\tau )}{N})\) as N grows large; see Mood et al. (1974) for details. Let \(R=\frac{Z}{N}\) and let r denote a realization of R. For each r, \(\mathbb {E}(\tau |Nr)\) is a Bayesian estimator and so converges to r as \(N\rightarrow \infty \) by the full support of \(f_{v}\).Footnote 31 Hence, we write

$$\begin{aligned} \lim _{N\rightarrow \infty }\sum _{Z=0}^{N-1}\frac{1}{\mathbb {E}(\tau |Z)}\pi (Z|\tau )= & {} \lim _{N\rightarrow \infty }\int _{-\infty }^{\infty }\frac{1}{\mathbb {E}(\tau |Nr)}\phi _{R}(r,N)dr\\= & {} \lim _{N\rightarrow \infty }\int _{-\infty }^{\infty }\frac{1}{r}\phi _{R}(r,N)dr=\frac{1}{\tau }, \end{aligned}$$

where \(\phi _{R}\) is the density of R at N and where the last equality follows from the fact that the distribution of R collapses to \(\tau \) as N grows large. Analogous reasoning gives that

$$\begin{aligned} \lim _{N\rightarrow \infty }\sum _{Z=0}^{N-1}\frac{1}{1-\mathbb {E}(\tau |Z)}\pi (Z|\tau )=\frac{1}{1-\tau }. \end{aligned}$$

This concludes the proof. \(\square \)

Proof of Proposition 12. Obvious and omitted. \(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fisher, J.C.D., Flannery, T.J. Designing randomized response surveys to support honest answers to stigmatizing questions. Rev Econ Design 27, 635–667 (2023). https://doi.org/10.1007/s10058-022-00314-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10058-022-00314-6

Keywords

JEL Classification

Navigation