A criterion for privacy protection in data collection and its attainment via randomized response procedures

: Randomized response (RR) methods have long been suggested for protecting respondents’ privacy in statistical surveys. However, how to set and achieve privacy protection goals have received little attention. We give a full development and analysis of the view that a privacy mechanism should ensure that no intruder would gain much new information about any respondent from his response. Formally, we say that a privacy breach occurs when an intruder’s prior and posterior probabilities about a prop- erty of a respondent, denoted p and p ∗ , respectively, satisfy p ∗ < h l ( p ) or p ∗ > h u ( p ), where h l and h u are two given functions. An RR proce- dure protects privacy if it does not permit any privacy breach. We explore eﬀects of ( h l ,h u ) on the resultant privacy demand, and prove that it is precisely attainable only for certain ( h l ,h u ). This result is used to deﬁne a canonical strict privacy protection criterion, and give practical guidance on the choice of ( h l ,h u ). Then, we characterize all privacy satisfying RR procedures and compare their eﬀects on data utility using suﬃciency of experiments and identify the class of all admissible procedures. Finally, we establish an optimality property of a commonly used RR method.


Introduction
In recent years, businesses, organizations and government agencies have been gathering increasingly vast amounts of data from surveys, commercial transactions, on-line searches and postings, medical records and other sources, and heavily using data analytics in making business and policy decisions. Simultaneously, concerns about privacy and data confidentiality have been increasing substantially. Protecting privacy and personal information is essential for legal reasons and for upholding public trust and support. Several books, e.g., Willenborg and de Waal (2001), Aggarwal and Yu (2008), Hundepool et al. (2012) and Torra (2017), and many papers discuss various privacy and confidentiality protection methods such as grouping, data swapping, cell suppression, imputation and response randomization.
Privacy violations occur in many forms depending on data type, privacy desires and intruders' knowledge and behavior. Thus, various privacy concepts and measures have appeared in the literature, including identity disclosure, differential privacy, k-anonymity and l-diversity (see Chen et al., 2009). Fung et al. (2010) present a systematic review of different approaches. However, as Kifer and Lin (2012) noted, most privacy measures are developed intuitively and can lead us astray, and thus one should use privacy criteria that are logically sound and practical. Evfimievski et al. (2003) introduced one such criterion, called ρ 1to-ρ 2 privacy, in the context of randomized response (RR) surveys of categorical variables. Nayak et al. (2015) proposed a similar criterion, called β-factor privacy. The main objectives of this paper are to present some new perspectives on these two criteria and develop and explore the underlying ideas in full generality. Interestingly, we find that any privacy specification amounts to putting an upper bound on all Bayes factors. Thus, privacy needs should be assessed most appropriately in terms of Bayes factors. We obtain a complete characterization of all RR procedures that satisfy any specified privacy criterion. Moreover, we compare all privacy preserving procedures by data utility and identify the admissible procedures.
To describe the context and concepts, we consider a categorical survey variable (or a cross-classification of several variables) X with the set of possible categories S X = {c 1 , . . . , c k }. Let π i , i = 1, . . . , k, denote the population level relative frequencies of c 1 , . . . , c k , which are unknown. We collect data to estimate π = (π 1 , . . . , π k ) and make other inferences about π. To protect privacy, an RR survey asks each respondent to use a given random mechanism to generate and report a perturbed version Z of his/her true value of X. We refer to Warner (1965), Chaudhuri and Mukerjee (1988), Chaudhuri (2010) and  for review of RR theory and additional references. Denote the output space by S Z = {d 1 , . . . , d m }. The transition probabilities p ij = P (Z = d i |X = c j ), i = 1, . . . , m, j = 1, . . . , k, are prespecified and embedded in the randomization device. The matrix P = ((p ij )), called the transition probability matrix (TPM), determines all statistical properties of the RR mechanism, and designing an RR survey essentially reduces to choosing P . Thus, we shall identify an RR procedure by its underlying P . We shall require that each row of P contains at least one nonzero element to define m and S Z unambiguously. Note that if the ith row of P is zero, then P (Z = d i ) is always zero and d i is irrelevant. The columns of P are also called channel distributions; see Duchi et al. (2018). Note that the sample spaces S X and S Z of X and Z, respectively, need not be the same, or even have the same cardinality. For example, in the RAPPOR algorithm of Erlingsson et al. (2014), m = 2 k .
Clearly, an RR survey generates data on Z (and not on X). Under simple random sampling, the distribution of Z is determined by λ = P π. (1.1) Thus, the data on Z can be used to estimate λ. To estimate π, essentially one would need to use (1.1) and an estimate of λ, sayλ. If k = m and P is nonsingular, π is estimated usingπ = P −1λ , see e.g., Chaudhuri and Mukerjee (1988). If m < k, or more generally if the columns of P are not linearly independent, the model for Z is not identifiable with respect to π and hence π is not estimable. Thus, inference and data utility considerations suggest to use m ≥ k and P with rank k. In a parametric model, if π is a function of fewer parameters, identifiability might hold even when m < k. However, identifiability with respect to π ensures that the data set can be analyzed using different models for various (possibly unforeseen) purposes.
In the preceding framework, Evfimievski et al. (2003) defined ρ 1 -to-ρ 2 privacy, taking a Bayesian view. For a target respondent B, suppose an intruder R's prior probability of X = c j is α j , and let α = (α 1 , . . . , α k ) . Note that an intruder's prior α about a target may be quite different from π. For a given prior α, Also, R's prior and posterior probabilities of any Q ⊆ S X = {c 1 , . . . , c k } are: For brevity, we shall denote P α (X ∈ Q) by P α (Q) and P α (X ∈ Q|Z = d i ) by P α (Q|d i ).
(a) An RR procedure is said to permit an upward ρ 1 -to-ρ 2 privacy breach with respect to Q ⊆ S X and a prior distribution α if for some Similarly, a procedure is said to admit a downward ρ 2 -to-ρ 1 privacy breach if An RR procedure is said to provide ρ 1 -to-ρ 2 privacy protection if it does not permit an upward ρ 1 -to-ρ 2 privacy breach or a downward ρ 2 -to-ρ 1 privacy breach with respect to any Q and any prior α. Definition 1.2. (Nayak et al., 2015) For a given β > 1, an RR procedure admits a β-factor privacy breach, with respect to Q ⊆ S X and a prior α if P α (Q) > 0 and An RR procedure provides β-factor privacy if it does not allow a β-factor breach with respect to any Q and any α.
The above two criteria are very strong, as they require no privacy breach for any d i , Q and α. Thus, no answer (d i ) of a respondent B would give "much" new information to any intruder R (characterized by α) about any property (Q) of B with respect to X. Evfimievski et al. (2004) introduced a similar concept of privacy breach in privacy preserving association rule mining. In practice, values of (ρ 1 , ρ 2 ) and β should be chosen based on the sensitivity of X and privacy concerns. Here, the β-factor privacy is simpler as it requires us to specify only one number (β). Interestingly, the strict privacy requirements of the two criteria are achievable, as summarized below. Definition 1.3. (Nayak et al., 2015) The ith row parity of P is defined as with the convention 0/0 = 1 and a/0 = ∞ for any a > 0. Furthermore, the parity of P is defined as η(P ) = max i {η i (P )}.
We shall see later that (1.6) is also a necessary condition for P to provide ρ 1 -to-ρ 2 privacy. We should mention that Boreale and Paolini's (2015) concept of "worst-case breach" is essentially the same as β-factor breach. They also proved a version of Theorem 1.2. The concept of parity is very similar to γamplification of Evfimievski et al. (2003). Clearly, η(P ) ≥ 1 and it is finite if and only if all elements of P are positive. Also, for any given η 0 , it is possible to construct P with parity η 0 ; see Evfimievski et al. (2003) and Agrawal et al. (2009). In particular, for m ≥ k, one P with η(P ) = η 0 is obtained by taking The rest of the paper is organized as follows. In Section 2, we present some new perspectives on ρ 1 -to-ρ 2 and β-factor privacy, including a geometric view and equivalency with -differentially local privacy, and then propose a general privacy criterion (in Definition 2.2) that covers Definitions 1.1 and 1.2 as special cases. Essentially, we pursue the spirit of ρ 1 -to-ρ 2 and β-factor privacy to the fullest extent and permit any (reasonable) privacy breach criterion. In Section 3, we explore implications and practicality of the general criterion. We develop a canonical form of the general criterion and characterize all RR procedures that provide required privacy. In Theorem 3.1, we prove that P satisfies a specified privacy demand if and only if η(P ) is appropriately small. In Section 4, we compare data utility of all privacy satisfying P . Employing Blackwell's (1951Blackwell's ( , 1953 concept of sufficiency of experiments, which is agnostic about inferential goals and loss functions, we characterize the class of all admissible privacy preserving procedures. We also prove a particular optimality property of a simple RR procedure. We note some concluding remarks in Section 5.

A general criterion
To motivate a general criterion, we shall first discuss some logical and practical features of Definitions 1.1 and 1.2. The ρ 1 -to-ρ 2 and β-factor privacy criteria are very strong, but it should be noted that those are applicable only when an intruder R knows his/her target B's value of Z. Typically, this happens at data collection time, with R being the data collector. In commercial data mining context, Agrawal et al. (2009) refer to this as business-to-customer (or B2C) privacy. Definitions 1.1 and 1.2 are not applicable if R gets access only to an anonymized version of the original data set, where B's records cannot be ascertained with certainty. In other words, ρ 1 -to-ρ 2 and β-factor privacy criteria presumes disclosure of B's identity to an intruder.
Related to the preceding point, we also want to mention that while privacy and confidentiality have often been used synonymously, those should be distinguished due to some important differences (see Nayak et al., 2015). In legal terms, privacy is a person's right to freedom from intrusion into his/her information. Privacy emerges as a desire to share no or only obscured information with a data collector. Thus, privacy protection should occur at the time of data collection. In contrast, confidentiality is an obligation to prevent unauthorized access to private information. People often give their information trusting that their data will be used by researchers and policy makers only to learn about the population as a whole and not about any individual. Privacy applies to individuals whereas confidentiality applies to the data, which may be addressed after data collection. One important technical (and practical) implication is that one may examine the whole data set for choosing a suitable method for confidentiality protection. In contrast, methods for privacy protection need to be selected before data collection. Consequently, some concepts, such as k-anonymity and l-diversity, and related methods are applicable for protecting confidentiality but not privacy.
Remark 2.1. A referee suggested us to relate this paper to research on PRAM (the post-randomization method), introduced by Gouweleeuw et al. (1998). Many papers, e.g., Van den Hout and Van der Heijedn (2002), Van den Hout and Elamir (2006) and Cruyff et al. (2008), discuss properties, variations and applications of the method. PRAM and RR are closely related as both methods randomize true responses using a transition probability matrix P . However, PRAM and RR are not equivalent (see Nayak et al., 2015 and. RR is for privacy protection whereas PRAM aims to protect confidentiality. Consequently, one important distinction is that P is fixed in RR, but in PRAM, it can be chosen based on the observed (unperturbed) data set. In particular, in invariant (or unbiased) PRAM, P must depend on the data. Such data dependency of P affects statistical inferences, which was noted and investigated by .
Privacy and confidentiality concerns for are different. So, their measures and protection goals are also different. One of the top confidentiality breach concerns is identity disclosure, which occurs when the records of a survey unit is correctly identified in released data, by matching the values of some of the variables that are easily available from other sources. Recently, Nayak et al. (2018) developed a PRAM procedure that strictly controls identification risks at a very little loss of data utility. In summary, PRAM a useful tool for confidentiality protection, but this paper, which deals with privacy protection, is not directly linked to PRAM.
As we discuss next, logically ρ 1 -to-ρ 2 and β-factor privacy directly address the core of privacy concern, which is: how much information an intruder might gain about a respondent from his/her response (possibly perturbed)? One compelling view of information, as Basu (1988) articulated, is: "Information is what information does. It changes opinion." Furthermore, opinion can be expressed precisely only using subjective probability. An intruder's prior and posterior probabilities describe respectively his/her initial and revised opinion, after learning a respondent's reported value. These constitute a strong argument that privacy should be discussed in terms of intruders' prior and posterior probabilities (instead of technical information measures, e.g., mutual information and f -divergence, that were developed in other contexts). Definitions 1.1 and 1.2 coincide with the above view and are thus highly relevant to privacy considerations.
The changing of a prior to posterior occurs only through the likelihood function, and the change is small if the likelihood function is relatively flat. In our setting, for response d i , the likelihoods for c 1 , . . . , c k are p ij , j = 1, . . . , k, and they are fairly close to each other when η i (P ) is small. Consequently, the likelihood functions for all possible responses are fairly flat if and only if η(P ) is fairly small. This comes out precisely in Theorems 1.1 and 1.2.
We now mention a connection to the following concept (see, Duchi et al., 2018) of differential local privacy.

Definition 2.1. An RR method provides -differentially local privacy ( -DLP),
It can be seen that (2.1) is equivalent to η(P ) ≤ exp( ). So, in view of Theorem 1.2, -DLP and β-factor privacy are equivalent, with β = exp( ). An equivalency of -DLP and ρ 1 -to-ρ 2 privacy can be observed similarly. Clearly, the thinking behind Definitions 1.1, 1.2 and 2.1 are different, but mathematically, they are equivalent as each one corresponds to an upper bound on η(P ).  gives a helpful geometrical perspective of ρ 1 -to-ρ 2 and β-factor privacy. The two shaded rectangles represent the privacy breach region (PBR) of ρ 1 -to-ρ 2 privacy, as any (prior, posterior) pair, to be denoted generically by (p, p * ), falling in this region signifies a privacy breach. The two shaded triangles constitute the PBR of β-factor privacy. In practice, visual inspection of various PBRs might help to choose the parameter values, e.g., (ρ 1 , ρ 2 ) or β, of a privacy criterion, and also to compare different privacy guarantees. Naturally, a larger PBR implies a stronger privacy guarantee. Among two PBRs in Figure 1, none is a subset of the other one, but as the β-factor PBR has a larger area and covers most of the other PBR, one might reasonably consider it stronger. As such, two overlapping PBRs, as in Figure 1, are not comparable, but we shall see in Section 3 that privacy demands of any two PBRs can be compared meaningfully.
One main goal of this paper is to explore the central idea of ρ 1 -to-ρ 2 and βfactor privacy in full generality. Thus, we now consider a general privacy breach region W , as shown by the shaded region in Figure 2, and require that no (prior, posterior) pair must fall in W . So, the unshaded part (W c ) is the privacy holding region. Describing the down and up privacy breach boundaries of W with two functions h l and h u , we introduce the following.

Definition 2.2. Let h l and h u be two functions from
for i = 1, . . . , m and all α, Q ⊆ S X . Clearly, the general idea is that for privacy protection, if the prior probability of an event is p, then its posterior probability must be between h l (p) and h u (p). Obviously, this covers Definitions 1.1 and 1.2 as special cases. Mathematically, we do not need to put additional conditions on h l and h u , but intuitively, they should be nondecreasing. (We shall see in the sequel that all precise PBRs satisfy this condition.) Definition 2.2 specifies a privacy demand with its PBR On the other hand, the privacy provided by any RR procedure can be described by its PBR as defined next. Definition 2.3. We define the PBR of any RR procedure P as the collection of all non-attainable (prior, posterior) pairs under P , and denote it by W P . Thus, W P is the complement (with respect to the unit square) of P 's privacy holding region: {(p, p * ), 0 ≤ p, p * ≤ 1 : P α (Q) = p and P α (Q|d i ) = p * for some d i , α and Q ⊆ S X }.
Definition 2.4. We shall call a general privacy breach region W precise if there exists an RR procedure P such that W P = W .
The preceding two definitions will be useful to comparing and matching privacy demand with privacy provided by different procedures. Clearly, an RR is not precise, to guarantee (h l , h u )-SIP one must use an RR procedure P for which W P is strictly larger than W [h l , h u ], and in such cases, we should report W P , the PBR of the procedure actually used, to communicate the privacy guarantee precisely and maximally. This also implies that to determine privacy requirement we should think only in terms of precise PBRs. These observations raise some natural questions, such as: What are the precise PBRs? Which procedures satisfy a given precise PBR? For given h l and h u , is there a minimal W P satisfying W [h l , h u ] ⊆ W P ? We answer these questions in the next section.

Characterization of strict information privacy
We begin this section with some analytic simplifications of the (h l , h u )-SIP criterion. First, note that P α (Q) = 0 implies P α (Q|d i ) = 0 and P α (Q) = 1 implies P α (Q|d i ) = 1, for all d i . So, (2.2) holds automatically when P α (Q) is 0 or 1, and to establish (h l , h u )-SIP, we need to verify (2.2) only for all α, for all Q ⊆ S X , i.e., the condition for downward privacy breach is equivalent to an upward privacy breach criterion. (Evfimievski et al., 2003) made a similar observation for ρ 1 -to-ρ 2 privacy.) Combining the two upward breach conditions and defining for all i = 1, . . . , m and all α and Q ⊆ S X such that 0 < P α (Q) < 1.
≤ 1 for all 0 ≤ a ≤ 1. In view of Lemma 3.1 and preceding discussions, we may define a privacy criterion more succinctly only in terms of upward breaches as follows.
It can be seen that h-CSIP also provides the downward privacy guarantee that P α (Q|d i ) ≥h(P α (Q)) for i = 1, . . . , m and all α and Q ⊆ S X , wherẽ h(a) = 1 − h(1 − a), 0 ≤ a ≤ 1. Thus, the upper and lower boundaries of the PBR of h-CSIP are given by h andh, respectively. Lemma 3.1 shows that for any h l and h u , (h l , h u )-SIP and [h l h u ]-CSIP are equivalent, in the sense that if an RR procedure guarantees one of the two, it must also guarantee the other one. However, the PBR given by some (h l , h u )-SIP can be a proper subset of the PBR of the corresponding [h l h u ]-CSIP. This is illustrated in Figure 3, where the PBR of [h l h u ]-CSIP is the PBR of (h l , h u )-SIP (shown as the region shaded Subsequently, we shall explore only h-CSIP, because it is the analytical crux of any privacy criterion as seen above. For any given h, define for all i, j and α such that P α (X = c j ) > 0 and P α (Z = d i ) > 0. The necessary and sufficient condition in Theorem 3.1 depends on h only through B(h) and on P only through its parity η(P ). Thus, in h-CSIP context, B(h) quantifies the privacy demand of h and η(P ) is the privacy level of P . We can measure of the privacy demand of any general PBR, with downward and upward breach boundaries h l and h u , as Using this measure, we can compare privacy demands of any two PBRs, even when they overlap as in Figure 1. Likewise, we can compare the privacy level of all RR procedures P using parity.
Consider an RR procedure P with η(P ) = γ > 1. Then, by Theorem 3.1 and (3.3), P guarantees h-CSIP for all h such that Thus, h (γ) (.) is the up breach boundary of any P with parity γ. The corresponding down boundary ish (γ) (p) = 1 − h (γ) (1 − p). Note that the PBR of P is determined only by its parity. So, all P with a common parity have the same PBR. It also follows that W is a precise PBR if and only if its up and down breach boundaries are h (γ) andh (γ) , respectively, for some γ > 1. As γ increases, h (γ) (p) shifts upward and the PBR gets smaller, as shown in Figure  4.
Let H = {h (γ) (.); γ > 1}, i.e., the class of all function of the form h (γ) (.). Then, h-CSIP with all h ∈ H represent all precise PBRs, which are most relevant to choosing privacy requirement and communicating privacy guarantee. A logical conclusion is that for strict privacy protection, we should think only in terms of h-CSIP and limit h to H. A practical meaning of h (γ) -CSIP may not be immediate, but as we show next, this amounts to imposing a bound on all Bayes factors.
For given α, the prior odds of Q is P α (Q)/[1 − P α (Q)] and its posterior odds . Now, take any γ > 1 and consider the following privacy requirement: for all α, Q and d i such that 0 < P α (Q) < 1 and P α (Z = d i ) > 0. The left side of (3.6) is the ratio of posterior odds of Q to its prior odds, or the Bayes factor for testing X ∈ Q against X / ∈ Q; see Kass and Raftery (1995) for a very informative discussion of Bayes factor. Thus, (3.6) requires all Bayes factors to be less than or equal to γ. Considering Q c , it can be seen that (3.6) also implies that all Bayes factors are at least γ −1 . In summary, (3.6) requires all Bayes factors to be between γ −1 and γ. The above criteria is analogous to β-factor privacy; while (3.6) uses the ratio of posterior and prior odds, β-factor privacy uses the ratio of the two probabilities. In other words, β-factor privacy uses probability scale whereas (3.6) uses odds scale.
By routine algebra, it can be seen that (3.6) is equivalent to . (3.7) Notice that the right side of (3.7), considered as a function of P α (Q), is the same as the function h (γ) (.) defined in (3.5). So, h (γ) -CSIP is equivalent to the privacy requirement of (3.6) and γ can be interpreted conveniently as the upper bound on all Bayes factors (and the lower bound is γ −1 ). It is also seen that any precise PBR corresponds to the privacy requirement of (3.6), with a matching value of γ. Based on previous discussions we reach the following practical conclusions.
(1) While ρ 1 -to-ρ 2 and β-factor privacy and more generally (h l , h u )-SIP are intuitively sensible, we should discuss, assess and communicate privacy only in terms of h-CSIP with h ∈ H, or equivalently in terms of bounds on Bayes factors as in (3.6). Both the graphical representation, as in Figure 4, and the Bayes factor interpretation should be helpful for choosing suitable values of γ in practical applications. Kass and Raftery (1995) recommend to interpret a Bayes factor 20 or larger as strong evidence, which suggests that values around 20 might be suitable for γ in our context.
(2) Satisfying any privacy requirement reduces strictly to using a procedure with a sufficiently small parity, as stated in Theorem 3.1. We can always find a procedure to provide required privacy, but not uniquely because for any γ > 1, there exists many P with η(P ) = γ. We should compare data utility to choose one procedure among all privacy satisfying procedures. We discuss this in the next section.
(3) Our results also bring out a new interpretation of -DLP (see Definition 2.1). Recall that an RR procedure P provides -DLP if and only if η(P ) ≤ exp( ). Then, in view of Theorem 3.1 and subsequent discussions, -DLP is equivalent to h (γ) -CSIP, with γ = exp( ). Thus, -DLP can be explained by the PBR of the corresponding h (γ) -CSIP.

Comparison of data utility
In earlier sections, we observed that a randomization procedure P provides strict privacy protection if and only if η(P ) ≤ γ, where γ > 1 is determined by the privacy requirement. Recall that P m×k must be a transition probability matrix (TPM) and each row of P must contain at least one nonzero value. We also argue that no two rows of P should be proportional to each other. The ith row is the (nonparametric) likelihood function when Z = d i . So, from likelihood perspective, if rows i and j are proportional, the statistical information from observing Z = d i and Z = d j are the same and the two outcomes (and the corresponding rows) should be merged. Alternatively, two proportional rows can be viewed as obtained from randomly splitting one outcome into two. (This is analogous to irrelevantly splitting one choice into two in discrete choice analyses; e.g., splitting "bus" into "blue bus" and "green bus" in mode of transportation choice.) Also, if proportional rows are allowed, then S Z and m cannot be defined uniquely.
With the natural constraints discussed above, the class of all privacy preserving procedures, at a desired level γ, is: C γ = {P m×k : m ≥ 2, η(P ) ≤ γ and P has no proportional rows}.
As we noted earlier, one may also impose m ≥ k and rank(P ) = k, for model identifiability. However, these are not needed for our results. Intuitively, we should compare data utility to select a procedure from C γ for application. However, "data utility" is difficult to define and measure as the data may be used and analyzed in different ways and for various purposes. It may not even be possible to anticipate all future usage of the data at the time of the survey. Recognizing this, we shall first discuss some admissibility results using Blackwell's (1951Blackwell's ( , 1953 notion of sufficiency of experiments, which is agnostic about inferential goals and loss functions.

Admissibility
Adopting Blackwell's (1951Blackwell's ( , 1953 criterion to our context, we introduce the following: Definition 4.1. For two randomization procedures A r×k and P m×k , we say that P is at least as informative (or good) as A, to be denoted P A, if there exists a transition probability matrix C r×m such that A = CP . In this case, P is also said to be sufficient for A.
If P A and also A P , then A and P are equivalent and will be denoted P ∼ A. We say that P is better than A and write P A if P A but A is not sufficient for P , i.e., A P .
It is easy to see that if C and P are TPMs, then A = CP is also a TPM. The intuitive idea behind Definitions 4.1 and 4.2 is that if A = CP , then the procedure A is equivalent to further randomizing (by C) the output of P , and because of the additional randomization, A cannot be more informative than P . Mathematically, it follows that if P is sufficient for A, then for any inference problem with a given loss function and any inference rule δ based the data from A, there exists a rule δ * based on P such that the risk of δ * is never larger than the risk of δ. Naturally, one should use only admissible procedures. In privacy literature, Blackwell's criterion has been used by Kairouz et al. (2016b).
Remark 4.1. The restriction that our TPMs must not contain proportional rows can be further justified as follows. Consider a procedure A m×k and suppose its first two rows, denoted a 1 and a 2 are proportional and a 1 = δ( a 1 + a 2 ), 0 < δ < 1. Construct P * (m−1)×k by merging the first two rows of A. Then, A and P * are equivalent, as A = CP * and P * = C * A with C and C * defined as: We can repeat this process to eliminate all proportional rows and thus obtain a P such that P has no proportional rows and P ∼ A.
Remark 4.2. Intuitively, permuting the rows of P m×k , i.e., relabeling the elements of S Z , should have no effect on either privacy or data utility. Mathematically, this holds easily. Specifically, it can be seen that if C m×m is a permutation matrix and A = CP , then (i) η(A) = η(P ) and (ii) A ∼ P (as C −1 is a also a permutation matrix and hence TPM). We also have the following result, whose proof is given in the Appendix. This result is intuitive: further randomization should not reduce privacy (by increasing parity). It also exhibits a trade-off between privacy and data utility: if P is at least as informative as A, in the sense of P A, then A provides at least as much privacy (by parity measure) as P .
Next, let C 0 γ denote all P in C γ satisfying the following two conditions: C1: η i (P ) = γ for all i C2: Each row of P contains exactly two distinct values.
We shall prove that a randomization procedure P is admissible within C γ if and only if P ∈ C 0 γ . We organize this result in several parts. Theorem 4.2. Any randomization procedure A ∈ C 0 γ is admissible within C γ . Proof. Take any A ∈ C 0 γ . We shall prove that if any P ∈ C γ is sufficient for A, then A must be equivalent to P . Suppose there exist P r×k ∈ C γ and a TPM C m×r such that A = CP . Each row of C must contain at least one nonzero element, as A does not have any zero row. We shall see that c iu = 0 implies that the uth row of P , denoted p u , is proportional to a i , the ith row of A. For each i, as η i (A) = γ, by C1, there exist j and l such that a ij = γa il . For such a ij and a il , equality holds in (4.1) and since c iu = 0, we must have p uj = γp ul . This holds for all j and l such that a ij = γa il . Since a i contains exactly two distinct (nonzero) values, by C2, considering all pairs (a ij , a il ) with a ij = γa il , it is seen that p u ∝ a i .
The preceding result implies that each row of C has exactly one nonzero entry; otherwise, P will have proportional rows. Then, if m < r, C must have some zero columns hence would not be a TPM. Also, if m > r, at least two rows of C must be proportional, and the corresponding rows of A are also proportional, which is a contradiction. So, we must have m = r and C must be a permutation matrix, to be a TPM, and thus A ∼ P .
We should note that in the preceding proof we not only showed that A ∼ P but also that P must be a permutation of the rows of A. Consequently, P ∈ C 0 γ , as A ∈ C 0 γ , and we have following.
Proof. Suppose, without loss of generality, 1 < η 1 (A) = a 11 /a 12 < γ. Then, there exists a row i such that a i1 < a i2 because each column of A adds to 1. For brevity suppose that a 21 < a 22 . Construct P m×k as follows: p 1 = a 1 + (1 − ξ) a 2 , p 2 = ξ a 2 , where ξ ≥ 1 is a constant (to be chosen suitably) and p i = a i , i = 3, . . . , m. Note that as all elements of A are positive, implied by η(A) ≤ γ, there exists ξ 0 such that P is a TPM for all 1 ≤ ξ < ξ 0 . Also, η i (A) = η i (P ) for i = 2, . . . , m and so, any difference in η(A) and η(P ) comes from the difference between η 1 (A) and η 1 (P ). Next, note that is a continuous function of ξ, and for ξ = 1, η 1 (P ) = η 1 (A) < γ. So, there exists 1 < ξ < ξ 0 for which η 1 (P ) ≤ γ and consequently, η(P ) ≤ γ. Take such a value ξ * and use that in the construction of P . Finally, note that P = CA and A = C −1 P , with Now, as ξ * > 1, C −1 is a TPM and hence P A. Also, as C is nonsingular, P = DA only with D = C. But, C is not a TPM, as ξ * > 1, and hence A is not sufficient for P . In summary, P A and hence A is inadmissible. Lemma 4.3. Suppose A ∈ C γ , η i (A) equals 1 or γ for all i, and η i (A) = 1 for some i. Then, there exists P ∈ C γ such that P A and P satisfies the condition C1.
Proof. Note that A can have at most one constant row because A ∈ C γ and thus cannot have proportional rows. For notational simplicity, suppose that η 1 (A) = 1, i.e., the all values in row 1 of A are the same, say δ. Then, from likelihood perspective, the response d 1 does not give any information about π. Intuitively, we may eliminate the response d 1 and distribute its probability (proportionally) to other responses. Specifically, construct P , from A, by deleting the first row and multiplying all other elements by (1 − δ) −1 . It can be seen easily that row parity of the retained rows remain the same and A m×k = C m×(m−1) P (m−1)×k , where all elements of the first row of C are δ and the remaining rows constitute (1 − δ)I m−1 . Thus, P satisfies C1 and P A.
If P as constructed in Lemma 4.3 also satisfies C2, i.e., P ∈ C 0 γ , then from Corollary 4.1, it follows that P A and hence A is inadmissible. As we shall show next, if P does not satisfy C2, then P is inadmissible, which implies A is also inadmissible. Note that together Lemmas 4.2 and 4.3 cover all forms of violations of C1. The following lemma also completes the proof of Theorem 4.3, stated below.
Lemma 4.4. Any randomization procedure A ∈ C γ that satisfies the condition C1 but not C2 is inadmissible within C γ .
Proof. Suppose A m×k ∈ C γ satisfies C1 but not C2. Thus, η i (A) = γ for i = 1, . . . m, and at least one row of A contains more than two distinct values. For notational simplicity, suppose the first row contains three or more distinct values and a 11 is a "middle" value, i.e., t < a 11 < T, where t = min i {a 1i } and T = max i {a 1i }. Note that T/t = γ as A satisfies C1. Let δ = (T − a 11 )/(T − t). Consider P * (m+1)×k whose rows are: p 1 = δ(t, a 12 , . . . , a 1k ), p 2 = (1 − δ) (T, a 12 , . . . , a 1k ) and p i = a i−1 , i = 3, . . . , m + 1. It can be verified easily that P * is a TPM, η i (P * ) = γ for i = 1, . . . m + 1 and A = CP * , with C = 1 1 0 0 0 I and thus P * A. Repeat the process to eliminate all "middle" values of A and if it creates any proportional rows, add those as per Remark 4.1. The resulting P belongs to C 0 γ and P A. Finally, in view of Corollary 4.1, we can conclude that P A and thus A is inadmissible. Theorem 4.3. A randomization procedure P ∈ C γ is admissible within C γ only if P satisfies C1 and C2, i.e., P ∈ C 0 γ .

Optimality results
Generally, the class C 0 γ of all admissible procedures contains many P . However, for k = 2, it can be seen easily that if P m×2 satisfies the condition C1 and has no proportional rows, then we must have m = 2. Moreover, C 0 γ consists of only two TPMs, which are also equivalent (by permutation). Thus, both are optimal procedures, one of which is reported below.
For k ≥ 3, choosing an optimal procedure from C 0 γ requires specific utility (or loss) functions. For a wide class of utility functions, Kairouz et al. (2016b) showed that under -DLP (which is equivalent to η(P ) ≤ e ), an optimal procedure, under given π, can be obtained by solving a linear programming problem. Kairouz et al. (2016a) proved a close version of our Proposition 4.1. Duchi et al. (2018) obtained bounds on minimax risks. In the following, we shall present one result in a common setting.
Frequently, the categories of the survey variable are used as possible response categories, i.e., m = k and d i = c i , i = 1, . . . k, and consequently S Z = S X . In such cases, a common desire is to retain the original values as much as possible while meeting the privacy requirement. One mathematical formulation of this idea is to choose P k×k to maximize i p ii , the trace of P , subject to η(P ) ≤ γ, where γ is specified. The optimal P for this objective is given below. γ+k−1 for all i = j. Proof. Take any P k×k satisfying η(P ) ≤ γ, which implies that p ii ≤ γp ij for all i = j. For fixed i, summing over j = i and then adding p ii to both sides, we get Then, adding both sides of the last inequality over i, and using the fact that for each j, k i=1 p ij = 1, we obtain the inequality in (a). The "if" part of (b) is easy to verify. For the "only if" part, the chain of inequalities in the preceding proof shows that equality in (a) holds if and only if p ij = p ii /γ for all i = j. This implies that p ij = a i /γ for all i = j, where a 1 , . . . , a k denote the diagonal elements of P . Now, as each column of P adds to 1, i.e., a j + 1 γ i =j a i = 1 we obtain: a i for all j = 1, . . . , k. Thus, we must have a 1 = · · · = a k and hence p ii = γ γ+k−1 for all i and p ij = 1 γ+k−1 for all i = j, as each column of P must add to 1. Let P 0 denote the optimal TPM (for given k and γ) given above. Thus, the elements of P 0 are: p ii = γ γ+k−1 for all i and p ij = 1 γ+k−1 for all i = j. This P 0 has some attractive features and has received much attention. Note that P 0 is in C 0 γ and hence admissible. Agrawal et al. (2009) refer to P 0 as "the Gamma-Diagonal matrix" due to its structure; it has a common diagonal value and also a common off-diagonal value. They also proved an optimality property of P 0 , in terms of lowest condition number, among all symmetric positive definite P with η(P ) ≤ γ. Kairouz et al. (2016b) refer to P 0 as "the randomized response mechanism" and present certain mutual information optimality of P 0 .

Discussion
In this paper, we investigated the logic underlying the ρ 1 -to-ρ 2 and β-factor privacy criteria in full generality. We gave new insight and clarity using geometrical representation of privacy breach regions. We introduced the concepts of precise PBR and canonical strict information privacy to accurately describe the privacy demands of any stated criterion. Our Theorem 3.1, which gives necessary and sufficient conditions for attaining desired privacy, is a significant result. It also yields a numerical measure of the privacy demand of any given PBR, and shows that the parity of an RR procedure determine its privacy guarantee. It also gives a set of practically relevant PBRs and tells us to choose one of those in setting privacy requirement in real applications.
We compared data utility of privacy satisfying RR procedures using sufficiency of experiments, which is a strong criterion that does not rely on any specific loss function or utility measure. The class of all privacy preserving admissible RR procedures, C 0 γ , is an important finding. We also obtained the optimum procedure under a specific criterion, viz., maximize the trace of P subject to privacy constraints.
We believe that the requirement of no privacy breach for any property Q is overly stringent. Cell collapsing (or generalization) is a common privacy protection tool, which can be viewed as a special case of RR, with P α (Z = d i |X = c j ) = 1 if c j is collapsed within d i (or d i contains c j ) and 0 otherwise. But, the parity of any such TPM is infinity, unless m = 1, in which case data utility is null. So, cell collapsing cannot give any strict information privacy without totally destroying data utility. It will be useful to modify the criterion by requiring no privacy breach for a subset of properties Q but for all priors. We leave choosing Q and appropriately modifying our results as future research topics.

Appendix: proofs
Proof of Lemma 3.2. The 'only if' part of the lemma follows readily. So, we shall prove only the 'if' part. Suppose (3.4) holds. Now, take any α, Q ⊆ {c 1 , . . . , c k } and d i such that P α (Q) > 0 and P α (Z = d i ) > 0. Suppose c q ∈ Q is such that p iq ≥ p ij for all j such that c j ∈ Q. Consider the priorα with elements:α j = α j if c j / ∈ Q,α q = P α (Q), andα j = 0 for all other j. Then, we have Pα(X = c q ) =α q = P α (Q) and for all 0 < α j < 1 and all {w l } such that 0 ≤ w l ≤ 1 and l =j w l = 1. This holds if and only if inf ).
( 6.3) The infimum in (6.3) is min{ p il pij | l = 1, . . . , k, l = j}. Moreover, it must be positive in order to satisfy (6.3), because h(p) cannot be 1 for all 0 < p < 1 and hence right side of (6.3) is positive. This implies that p ij must be positive for all j = 1, . . . , k. So, for our fixed i, (3.4) holds for all j and α if and only if (6.3) holds for j = 1, . . . , k, or equivalently, ).
( 6.4) Note that both sides of (6.4) are positive and finite, and the above inequality can be recognized as Proof of Theorem 4.1. The 'if' part follows easily as noted in Remark 4.2.
To prove the 'only if' part, suppose A and P are equivalent, i.e., there exist two TPMs C m×r and C * r×m such that A = CP and P = C * A. Then, (C * C) r×r and (CC * ) m×m are TPMs. Also, C * CP = C * A = P or (C * C − I)P = 0, and similarly, (CC * − I)A = 0. These imply, by Lemma 6.1 (given below), that both (CC * ) and (C * C) are identity matrices and consequently we must have m = r (since both of them are of full rank) and C −1 = C * . Now, since both are TPMs, C must be a permutation matrix (Minc, 1988, p. 3). We shall use the following concepts and results to prove this lemma. Definition 6.1. (Chakravarti, 1975) A square matrix B m×m is said to be reducible if there exists a permutation matrix Q such that where R and N are square matrices. Otherwise, B is called irreducible.
If B is the TPM of a Markov chain, then B is irreducible means that one can always find a path between any two states. Note that Q −1 BQ permutes the diagonal entries of B by exchanging the corresponding row and columns. We call this diagonal permutation in the following. In (6.5), if R or N are still reducible, they can be further reduced to the above form through diagonal permutation. Actually, if B is reducible, then through diagonal permutation, we can get a block lower-triangular matrix with irreducible diagonal blocks.
Theorem 6.1. (Chakravarti, 1975) If a non-negative matrix B m×m = ((b ij )) is irreducible, then the matrix F = B − D(r) must have rank m − 1, where D(r) is the diagonal matrix with entries (r 1 , r 2 , . . . , r m ), and r j = m i=1 b ij . Definition 6.2. (Taussky, 1949) The column j of a square matrix B = ((b ij )) is called weakly diagonal dominant, if i =j |b ij | ≤ |b jj |. It is called strictly diagonal dominant if '<' holds.
Theorem 6.2. (Taussky, 1949) Suppose B m×m is an irreducible matrix, and all columns of B are weakly diagonal dominant and at least one is strictly diagonal dominant. Then, B is nonsingular.
Proof of Lemma 6.1. First, suppose B m×m is irreducible, if possible. Let V 0 denote the vector space that is orthogonal to the row space of (B − I). Note that if (B − I)P = 0, then all columns of P must be in V 0 . Applying Theorem 6.1 to B, noting that each column of B adds to 1 as B is a TPM, we obtain rank(B − I) = m − 1. This implies that the dimension of V 0 is 1 and hence all columns of P are proportional. Actually, they are identical (as the sum of each column is 1) and hence all rows of P are also identical. This contradicts the assumption that P has no proportional rows. Thus, B cannot be irreducible.
Next, suppose B is reducible. Then, there exists a permutation matrix Q such that Q −1 BQ is a block lower-triangular matrix with irreducible diagonal blocks R 1 , R 2 , ..., R g . If all of these blocks are 1 × 1 identity matrices, then B = I as Q −1 BQ is TPM. If not, suppose R t+1 with dimension s × s is the first block that is not 1 × 1 identity matrix. This implies all the off-diagonal entries on the first t columns of Q −1 BQ must be 0 (when t ≥ 1). Take such a Q and denote R = R t+1 = ((r ij )) to obtain where P * = Q −1 P , which is also a TPM with no proportional rows. In view of (6.6), equation (6.7) implies that (R − I)P s = 0, where P s consists of the t + 1 to t + s rows of P * . Here, each column of P s is orthogonal to the rows of (R − I), and hence must be in V 1 , the orthogonal space to the row space of (R − I). We shall consider two cases to examine rank(R − I) and its implication.
(i) L = 0 or L does not exist (i.e., s = m − t). Here, R is a TPM. Also, s ≥ 2, since R is not 1 × 1 identity matrix. Apply Theorem 6.1 to R and the arguments used earlier (for irreducible B) to see that rank(R−I) = s−1. So, the dimension of V 1 is 1, implying that all columns of P s are constant multiples of a common vector and consequently all rows of P s are the same. This contradicts the fact that P * has no proportional rows. So, L cannot be a null matrix.
(ii) L = 0. Here, we shall apply Theorem 6.2 to R * = (R − I) = ((r * ij )). First, R * is irreducible, as R is so. Next, as each column of the right side of (6.6) adds to 1, we get s i=1 r ij ≤ 1 for j = 1, . . . , s, and '<' holds for at least one j, as L = 0. This shows, in view of 0 ≤ r ij ≤ 1, r * ii = r ii − 1 and r * ij = r ij for i = j, that s i =j |r * ij | ≤ |r * jj |, for j = 1, . . . , s and ' <' holds for some j. Thus, R * satisfies the conditions of Theorem 6.2 and hence rank(R * ) = s, i.e., (R − I) is nonsingular. Now, (R − I)P s = 0 implies that P s = 0, which is a contradiction. From the above discussion of all possible cases we must conclude that B = I.