Optimal locally private estimation under $\ell_p$ loss for $1\le p\le 2$

We consider the minimax estimation problem of a discrete distribution with support size $k$ under locally differential privacy constraints. A privatization scheme is applied to each raw sample independently, and we need to estimate the distribution of the raw samples from the privatized samples. A positive number $\epsilon$ measures the privacy level of a privatization scheme. In our previous work (IEEE Trans. Inform. Theory, 2018), we proposed a family of new privatization schemes and the corresponding estimator. We also proved that our scheme and estimator are order optimal in the regime $e^{\epsilon} \ll k$ under both $\ell_2^2$ (mean square) and $\ell_1$ loss. In this paper, we sharpen this result by showing asymptotic optimality of the proposed scheme under the $\ell_p^p$ loss for all $1\le p\le 2.$ More precisely, we show that for any $p\in[1,2]$ and any $k$ and $\epsilon,$ the ratio between the worst-case $\ell_p^p$ estimation loss of our scheme and the optimal value approaches $1$ as the number of samples tends to infinity. The lower bound on the minimax risk of private estimation that we establish as a part of the proof is valid for any loss function $\ell_p^p, p\ge 1.$


Introduction
This paper continues our work [1].The context of the problem that we consider is related to a major challenge in the statistical analysis of user data, namely, the conflict between learning accurate statistics and protecting sensitive information about the individuals.As in [1], we rely on a particular formalization of user privacy called differential privacy, introduced in [2,3].Generally speaking, differential privacy requires that the adversary not be able to reliably infer an individual's data from public statistics even with access to all the other users' data.The concept of differential privacy has been developed in two different contexts: the global privacy context (for instance, when institutions release statistics related to groups of people) [4], and the local privacy context when individuals disclose their personal data [5].
In this paper, we consider the minimax estimation problem of a discrete distribution with support size k under locally differential privacy.This problem has been studied in the non-private setting [6,7], where we can learn the distribution from the raw samples.In the private setting, we need to estimate the distribution of raw samples from the privatized samples which are generated independently from the raw samples according to a conditional distribution Q (also called a privatization scheme).Given a privacy parameter ǫ ą 0, we say that Q is ǫ-locally differentially private if the probabilities of the same output conditional on different inputs differ by a factor of at most e ǫ .Clearly, smaller ǫ means that it is more difficult to infer the original data from the privatized samples, and thus leads to higher privacy.For a given ǫ, our objective is to find the optimal ǫ-private scheme that minimizes the expected estimation loss for the worstcase distribution.In this paper, we are mainly concerned with the scenario where we have a large number of samples, which captures the modern trend toward "big data" analytics.

Existing results
The following two privatization schemes are the most well-known in the literature: the k-ary Randomized Aggregatable Privacy-Preserving Ordinal Response (k-RAPPOR) scheme [8,9], and the k-ary Randomized Response (k-RR) scheme [10,11].The k-RAPPOR scheme is order optimal in the high privacy regime where ǫ is very close to 0, and the k-RR scheme is order optimal in the low privacy regime where e ǫ « k [12].Very recently, a family of privatization schemes and the corresponding estimators were proposed independently by Wang et al. [13] and the present authors [1].In [1], we further showed that under both ℓ 2 2 (mean square) and ℓ 1 loss, these privatization schemes and the corresponding estimators are order-optimal in the medium to high privacy regimes when e ǫ !k.Subsequent to our work, [14] proposed another privatization scheme and proved that it is order optimal in all regimes for ℓ 1 loss.At the same time, prior to this paper, no schemes were shown to be asymptotically optimal in the literature.
Duchi et al. [15] gave an order-optimal lower bound on the minimax private estimation loss for the high privacy regime where ǫ is very close to 0. In [1], we proved a stronger lower bound which is order-optimal in the whole region e ǫ !k.This lower bound implies that the schemes and the estimators proposed in [13,1] are order optimal in this regime.Here order-optimal means that the ratio between the true value and the lower bound is upper bounded by a constant (larger than 1) when n and k{e ǫ both become large enough.

Our contributions
In this paper, we study the private estimation problem under the ℓ p p loss for 1 ď p ď 2, which in particular includes the widely used ℓ 1 and ℓ 2 2 loss.We prove an asymptotically tight lower bound on the ℓ p p loss of the minimax private estimation for all values of k, ǫ imsart-ejs ver.2014/10/16 file: tight_LB_for_LDPv2.texdate: October 18, 2018 and 1 ď p ď 2. This improves upon the lower bounds in [1] and [15] for the following three reasons: First, although the lower bounds in [1] and [15] are order-optimal, they differ from the true value by a factor of several hundred.In practice, an improvement of several percentage points is already considered as a substantial advance (see for instance, [12]), so tighter bounds are of interest.Second, the bounds in [1] and [15] only hold for certain regions of k and ǫ while the lower bound in this paper holds for all values of k and ǫ.Finally, previous results were limited to ℓ 1 and ℓ 2 2 loss functions while the results in this paper hold for all ℓ p p loss functions, where 1 ď p ď 2. Furthermore, as an immediate consequence of our lower bound, we show that the schemes and the estimators proposed in [13,1] are universally optimal under the ℓ p p loss for all 1 ď p ď 2 in the sense that the ratio between the lower bound and the worst-case estimation loss of these schemes and estimators goes to 1 when n goes to infinity.
In this paper we both generalize the results, and shorten the proofs in the preprint [16] which addressed only the case of mean square loss.

Related work
While in this paper we consider only the sample complexity, a recent work by Acharya et al. [14] took communication complexity into consideration and proposed a new privatization scheme with reduced communication complexity while maintaining the optimal order of sample complexity for the ℓ 1 loss function.Apart from the ℓ p loss measures considered in this paper, significant attention in the literature was devoted to the ℓ 8 estimation of a discrete distribution (also called the heavy hitters problem) under local differential privacy [17,18,19].Although we only consider the case where the same privatization scheme is applied to each raw sample in this paper, one can also construct privatization schemes that depend on the values of previously observed privatized samples.Such interactive privatization schemes are important for online and sequential procedures in private learning [20,21,15].A recent work [22] addresses the private estimation problem of distributional properties when the support size k is not known to the estimator.Other estimation-related problems that were studied under local differential privacy constraints include the problem of testing identity and closeness of discrete distributions [23] and hypothesis testing [24].

Organization of the paper
In Section 2, we formulate the problem and give a more detailed review of the existing results.Section 3 is devoted to an overview of the main results of this paper.The proofs of the main results are given in Sections 4-5.

Problem formulation and existing results
Notation: Let X " t1, 2, . . ., ku be the source alphabet and let p " pp 1 , p 2 , . . ., p k q be a probability distribution on X. Denote by ∆ k " tp P R k : p i ě 0 for i " imsart-ejs ver.2014/10/16 file: tight_LB_for_LDPv2.texdate: October 18, 2018 1, 2, . . ., k, ř k i"1 p i " 1u the k-dimensional probability simplex.Let X be a random variable (RV) that takes values on X according to p, so that p i " P pX " iq.Denote by X n " pX p1q , X p2q , . . ., X pnq q the vector formed of n independent copies of the RV X.

Problem formulation
In the classical (non-private) distribution estimation problem, we are given direct access to i.i.d.samples tX piq u n i"1 drawn according to some unknown distribution p P ∆ k .Our goal is to estimate p based on the samples [7].We define an estimator p as a function p : X n Ñ R k , and assess its quality in terms of the worst-case risk (expected loss) sup where ℓ is some loss function.The minimax risk is defined as the solution of the following saddlepoint problem: In the private distribution estimation problem, we can no longer access the raw samples tX piq u n i"1 .Instead, we estimate the distribution p from the privatized samples tY piq u n i"1 , obtained by applying a privatization mechanism Q independently to each raw sample X piq .A privatization mechanism (also called privatization scheme) The privatized samples Y piq take values in a set Y (the "output alphabet") that does not have to be the same as X.
The quantities tY piq u n i"1 are i.i.d.samples drawn according to the marginal distribution m given by mpSq " for any S P σpYq, where σpYq denotes an appropriate σ-algebra on Y.In accordance with this setting, the estimator p is a measurable function p : Y n Ñ R k .We assess the quality of the privatization scheme Q and the corresponding estimator p by the worst-case risk r ℓ k,n pQ, pq :" sup where m n is the n-fold product distribution and m is given by (1).Define the minimax risk of the privatization scheme Q as Definition 2.1.For a given ǫ ą 0, a privatization mechanism Q : X Ñ Y is said to be ǫ-locally differentially private if for all x, x Denote by D ǫ the set of all ǫ-locally differentially private mechanisms.Given a privacy level ǫ and a loss function ℓ, we seek to find the optimal Q P D ǫ with the smallest possible minimax risk r ℓ k,n pQq among all the ǫ-locally differentially private mechanisms.As already mentioned, in this paper we will consider1 ℓ " ℓ u u for 1 ď u ď 2, where for x " px 1 , x 2 , . . ., x k q P R k ℓ u u pxq :" It is easy to see that for any valid privatization scheme Q, the order of its ℓ u u minimax estimation risk is Θpn ´u{2 q, and lim nÑ8 r ℓ u u k,n pQqn u{2 is the coefficient of the dominant term, which measures the performance of Q when n is large.
Main Problem: Suppose that the cardinality k of the source alphabet is known to the estimator.For a given privacy level ǫ, we would like to find the optimal (smallest possible) value of lim nÑ8 r ℓ u u k,n pQqn u{2 among all Q P D ǫ and to construct a privatization mechanism and a corresponding estimator to achieve this optimal value.
It is this problem that we address-and resolve-in this paper.Specifically, we prove a lower bound on lim nÑ8 r ℓ u u k,n pQqn u{2 for Q P D ǫ , which implies that the mechanism and the corresponding estimator proposed in [1] are universally optimal for all loss functions ℓ u u , 1 ď u ď 2.

Previous results
In this section we briefly review known results that are relevant to our problem.In Sect.1.1 we mentioned several papers that have considered it, viz., [10,8,9,11,12,13,15,14].In this section we focus on the results of [1] because they are stated in the form convenient for our presentation.Let D ǫ,F be the set of ǫ-locally differentially private schemes with finite output alphabet.Let Qpy|xq min x 1 PX Qpy|x 1 q P t1, e ǫ u for all x P X and all y P Y * .
In [1, Theorem 13], we have shown that As a result, below we limit ourselves to schemes Q P D ǫ,E in this paper.For such schemes, since the output alphabet is finite, we can write the marginal distribution m in (1) as a vector m " p ř k j"1 p j Qpy|jq, y P Yq.We will also use the shorthand notation m " pQ to denote this vector.
In [1], we introduced a family of privatization schemes which are parameterized by the integer d P t1, 2, . . ., k ´1u.Given k and d, let the output alphabet be Y k,d " ty P t0, 1u k : Definition 2.2 ([1]).Consider the following privatization scheme: for all y P Y k,d and all i P X.The corresponding empirical estimator of p under Q k,ǫ,d is defined as follows: For y n " py p1q , y p2q , . . ., y pnq q P Y n k,d , pi py n q " ´pk ´1qe ǫ `pk´1qpk´dq d pk ´dqpe ǫ ´1q ¯ti py n q n ´pd ´1qe ǫ `k ´d pk ´dqpe ǫ ´1q , i P rks (7) where t i py n q " ř n j"1 y pjq i is the number of privatized samples whose i-th coordinate is 1.
Some papers [14] call Q k,ǫ,d the Subset Selection mechanism.It is easy to verify that Q k,ǫ,d is ǫ-locally differentially private.The worst-case estimation loss under Q k,ǫ,d and the empirical estimator is calculated in the following proposition.
and suppose that the empirical estimator p is given by (7).Let m " pQ k,ǫ,d .The estimation loss E Y n "m n ℓ 2 2 pppY n q, pq is maximized for the uniform distribution p U " p1{k, 1{k, . . ., 1{kq, and It is clear that the smallest value of the risk r is obtained by optimizing on d in (8).Namely, given k and ǫ, let where the ties are resolved arbitrarily.We find that d ˚takes one the following two values: d ˚" rk{pe ǫ `1qs or tk{pe ǫ `1qu.
Therefore, when k{pe ǫ `1q ď 1, d ˚" While in [1] we proved the above results for the mean-square loss (and a similar claim for ℓ " ℓ 1 ), in this paper we show that they apply more universally.Namely, let M pk, ǫq :" pk ´1q 2 k 2 pe ǫ ´1q 2 pd ˚eǫ `k ´d˚q2 d ˚pk ´d˚q .(10) and note that r In this paper we show that the quantity M pk, ǫq bounds below the main term of the minimax risk for all loss functions ℓ u u , u ě 1.

Main result of the paper
Our main result is that the scheme Q k,ǫ,d ˚and the empirical estimator p defined by (7) are universally optimal for all loss functions ℓ u u , 1 ď u ď 2. Namely, the following is true.
This theorem is a consequence of two results which we state next.Let X " N p0, 1q and define the constant C u :" E|X| u " 2 u{2 Γppu `1q{2q{ ?π for u ą 0.
Note that this lower bound holds for any loss function ℓ u u , u ě 1.The proof of this theorem is given in Section 4. Theorem 3.3.Consider the privatization scheme Q " Q k,ǫ,d ˚and let p be the empirical estimator given by (7).For every k and ǫ and every 0 ă u ď 2, The proof of this theorem is given in Section 5. Note that, unlike Theorem 3.2, the claim that we make here allows the values of u P p0, 1q.The special cases of Theorem 3.3 for u " 1 and u " 2 were addressed in our previous paper [1], see in particular Theorem 10.
The crux of our argument is in the proof of Theorem 3.2, where we reduce the estimation problem in the k-dimensional space to a one-dimensional problem.Generally, it is well known that the local minimax risk can be calculated from the inverse of the Fisher information matrix.However, it is difficult to obtain the exact expression of the imsart-ejs ver.2014/10/16 file: tight_LB_for_LDPv2.texdate: October 18, 2018 inverse of a large-size matrix, and without it, the path to the desired estimates is not so clear.To work around this complication, we view a ball in a high-dimensional space as a union of parallel line segments with a certain direction v i .We first consider the estimation problem on each line segment individually.Since this is a one-dimensional problem, its minimax rate can be easily calculated from the Fisher information of the corresponding parameter.For the estimation of each component p i of the probability distribution, we choose a suitable direction vector v i .In this way, we reduce the original k-dimensional estimation problem to k one-dimensional estimation problems and then rely on the additivity of the loss function for the final result.

Bayes estimation loss
In light of ( 5), to prove Theorem 3.2, it suffices to show that for every u ě 1, Since the worst-case estimation loss is always lower bounded by the average estimation loss, the minimax risk r ℓ u u k,n pQq can be bounded below by the Bayes estimation loss.More specifically, we assume that p :" tp 1 , p 2 , . . ., p k u is drawn uniformly from P :" where D " 1 is a constant.Let P " pP 1 , P 2 , . . ., P k q denote the random vector that corresponds to p.For a given privatization scheme Q and the corresponding estimator p :" pp 1 , p2 , . . ., pk q, the ℓ u u Bayes estimation loss is defined as Bayes pQ, pq :" and the optimal Bayes estimation loss for Q is Bayes pQ, pq.
We further define component-wise Bayes estimation loss for Q and p r ℓ u u i,Bayes pQ, pi q :" E As mentioned above, We will prove ( 12) by showing that

Lower bound on one-dimensional Bayes estimation loss
Below we will prove a lower bound on r ℓ u u i,Bayes pQq.To this end, in this section we consider a one-dimensional Bayes estimation problem.Define the following vectors: where the 1 is in the ith position and all the other coordinates are ´1 k´1 .Let p ˚:" pp 1 , p 2 , . . ., p k q P ∆ k be a probability distribution and let S i pp ˚q be a line segment with midpoint p ˚and direction vector v i : where D 1 " 1 is a constant.Let p " pp 1 , . . ., p k q be a PMF in the segment S i pp ˚q.
Given the value p i , we can find all the other components of p as follows: Assume that p " pp 1 , p 2 , . . ., p k q is drawn uniformly from S i pp ˚q, and we consider the Bayes estimation of p i from the privatized samples Y n obtained from applying Q to the raw samples.More precisely, for an estimator pi , we define its Bayes estimation loss then the optimal estimation loss is imsart-ejs ver.2014/10/16 file: tight_LB_for_LDPv2.texdate: October 18, 2018 Our approach to obtain the lower bound on this Bayes estimation loss relies on a classical method in asymptotic statistics, namely, local asymptotic normality (LAN) of the posterior distribution [25,26,27,28].More specifically, let P i be the random variable corresponding to p i .According to the well-known results in the LAN literature (see for instance [26, Chapter 2, Theorem 1.1] and [28,Chapter 6]), when the constant D 1 is large enough, the conditional distribution of P i given Y n " y n is approximately a Gaussian distribution with variance pIpp i qq ´1 for almost all2 y n P Y n as n goes to infinity, where Ip¨q is the Fisher information of the parameter p i .Before we calculate the value of Ipp i q, let us recall a simple fact about Gaussian distribution: Suppose that X is a Gaussian random variable, then one can easily verify3 that for any u ě 1, Therefore, the estimator pi py n q " EpP i |Y n " y n q is asymptotically optimal for this Bayes estimation problem under the ℓ u u loss function for all u ě 1.Since the variance of P i given Y n " y n is pIpp i qq ´1 for almost all y n P Y n , the Bayes estimation loss of this asymptotically optimal estimator is C u pIpp i qq ´u{2 p1 ´op1qq.
Thus we conclude that r ℓ u u i,Sipp ˚qpQq ě C u pIpp i qq ´u{2 p1 ´op1qq for all u ě 1. ( Now we are left to calculate the value of Ipp i q.To this end, we introduce some notation.For a given privatization scheme Q P D ǫ,E with output size L, we write its output alphabet as Y " t1, 2, . . ., Lu, and we use the shorthand notation for all j P rLs and v P rks.For j P rLs and y n " py p1q , y p2q , . . ., y pnq q P Y n , define w j py n q :" ř n v"1 ½ry pvq " js to be the number of times that symbol j appears in y n .Let Ppy n ; p i q be the probability mass function of a random vector Y n formed of i.i.d.samples drawn according to the distribution m " pQ, where the other components of p are calculated from p i according to (17).The random variables w j pY n q follow the multinomial distribution, and Ew j pY n q " nmpjq, j P rLs.Therefore, log Ppy n ; p i q " L ÿ j"1 and the Fisher information of p i is where p v 's on the last line are given by ( 17).In particular, Combining this with (19), we have ´opn ´u{2 q for all u ě 1.
For j P rLs, define It is clear that when p ˚is in the neighborhood of the uniform distribution p U , i.e., when p v " 1{k `on p1q for all v P rks, we have pq ji ´qj q 2 q j ¯´u{2 ´opn ´u{2 q for all u ě 1. (22)

Proof of (14)
Our first step in this section will be to prove a lower bound on r ℓ u u i,Bayes pQq.Let us phrase the claim in (22) in a more detailed form: For any δ ą 0, there exists D 0 ą 0 such that whenever the constant D 1 in the definition of S i pp ˚q is larger than D 0 , where the second inequality follows by Lemma 4.2 (note the inverted inequality of the Lemma because of the negative power ´u{2).Combining this with (25), we conclude that Thus we have established (14), and this completes the proof of Theorem 3.2.

Proof of Theorem 3.3
We begin with showing that for the privatization scheme Q k,ǫ,d defined in ( 6) and the estimator (7), the ℓ u u estimation loss is maximized for the uniform distribution p U for all 0 ă u ď 2 when n is large.To shorten the notation, rewrite (7) as pi py n q " A t i py n q n ´B, i P rks, where A :" pk ´1qe ǫ `pk´1qpk´dq d pk ´dqpe ǫ ´1q , B :" pd ´1qe ǫ `k ´d pk ´dqpe ǫ ´1q .
In [1] we have shown that the estimator pi py n q is unbiased, i.e., ´ti pY n q n ¯´B, i P rks.
By definition, is the sum of n i.i.d.Bernoulli random variables with parameter Therefore the variance of tipY n q n is 1 n p pi A `B A qp1 ´pi A ´B A q, and the variance of pi pY n q is Var pi pY n q " A 2 Using the Central Limit Theorem, we then obtain for the absolute moment of pi pY n q around p i the following approximation: where C u is the absolute moment of the N p0, 1q RV; see Section 3. Therefore, E Y n "ppQ k,ǫ,d q n ℓ u u pppY n q, pq " where the first inequality follows from the fact that x u{2 is a concave function of x on p0, `8q for all positive 0 ă u ď 2, and the last line uses the Cauchy-Schwarz inequality.Both inequalities hold with equality if and only if p is the uniform distribution.Thus when n is large, for all 0 ă u ď 2 and all 1 ď d ď k ´1, we have r ℓ u u k,n pQ k,ǫ,d , pq " E Y n "pp U Q k,ǫ,d q n ℓ u u pppY n q, p U q.
In particular, it also holds for d " d ˚.Next we calculate the estimation loss at the uniform distribution.By symmetry, it is clear that k,n pQ k,ǫ,d ˚, pq " M pk, ǫq n .
Therefore when the input distribution is uniform, pi pY n q can be approximated for large n by a Gaussian random variable with mean 1{k and variance Mpk,ǫq n .Thus, E Y n "pp U Q k,ǫ,d ˚qn ˇˇpipY n q ´1 k ˇˇu " C u ´M pk, ǫq n ¯u{2 `opn ´u{2 q, so for 0 ă u ď 2, r ℓ u u k,n pQ k,ǫ,d ˚, pq " E Y n "pp U Q k,ǫ,d ˚qn ℓ u u pppY n q, p U q " k n u{2 C u M pk, ǫq u{2 `opn ´u{2 q.
This completes the proof of Theorem 3.3.