A Comprehensive Survey on Local Differential Privacy toward Data Statistics and Analysis

Collecting and analyzing massive data generated from smart devices have become increasingly pervasive in crowdsensing, which are the building blocks for data-driven decision-making. However, extensive statistics and analysis of such data will seriously threaten the privacy of participating users. Local differential privacy (LDP) was proposed as an excellent and prevalent privacy model with distributed architecture, which can provide strong privacy guarantees for each user while collecting and analyzing data. LDP ensures that each user’s data is locally perturbed first in the client-side and then sent to the server-side, thereby protecting data from privacy leaks on both the client-side and server-side. This survey presents a comprehensive and systematic overview of LDP with respect to privacy models, research tasks, enabling mechanisms, and various applications. Specifically, we first provide a theoretical summarization of LDP, including the LDP model, the variants of LDP, and the basic framework of LDP algorithms. Then, we investigate and compare the diverse LDP mechanisms for various data statistics and analysis tasks from the perspectives of frequency estimation, mean estimation, and machine learning. Furthermore, we also summarize practical LDP-based application scenarios. Finally, we outline several future research directions under LDP.


Introduction
With the rapid development of wireless communication techniques, Internet-connected devices (e.g., smart devices and IoT appliances) are ever-increasing and generate large amounts of data by crowdsensing [1]. Undeniably, these big data have brought our rich knowledge and enormous benefits, which deeply facilitates our daily lives, such as traffic flow control, epidemic prediction, and recommendation systems [2][3][4]. To make better collective decisions and improve service quality, a variety of applications collect users data through crowdsensing to analyze statistical knowledge of the social community [5]. For example, the third-parties learn rating aggregation by gathering preference options [6], present a crowd density map by recording users locations [7], and estimate the power usage distributions from meter readings [8,9]. Almost all data statistics and analysis tasks fundamentally depend on a basic understanding of the distribution of the data.
However, collecting and analyzing data has incurred serious privacy issues since such data contain various sensitive information of users [10][11][12]. Even worse is that, driven by advanced data fusion and analysis techniques, the private data of users are more vulnerable to attack and disclosure in the big data era [13][14][15]. For example, the adversaries can infer the daily habits or behavior profiles of family members (e.g., the time of presence/absence in the home, certain activities such as watching TV, 1.
We firstly provide a theoretical summarization of LDP from the perspectives of the LDP models, the general framework of LDP algorithms, and the variants of LDP.

2.
We systematically investigate and summarize the enabling LDP mechanisms for various data statistics and analysis tasks. In particular, the existing state-of-the-art LDP mechanisms are thoroughly concluded from the perspectives of frequency estimation, mean value estimation, and machine learning. 3.
We explore the practical applications with LDP to show how LDP is to be implemented in various applications, including in real systems (e.g., Google Chrome, Apple iOS), edge computing, hypothesis testing, social networks, and recommendation systems.

4.
We further distinguish some promising research directions of LDP, which can provide useful guidance for new researchers. Figure 1 presents the main research categories of LDP and also shows the main structure of this survey. We first provide a theoretical summarization of LDP, which includes the LDP model, the framework of LDP algorithms, and the variants of LDP. Then, from the perspective research tasks, we summarize the existing LDP-based privacy-preserving mechanisms into three categories: frequency estimation , mean estimation and machine learning. We further subdivide each category into several subtasks based on different data types. In addition, we summarize the applications of LDP in real practice and other fields. The rest paper is organized as follows. Section 2 theoretically summarizes the LDP. The diverse LDP mechanisms for frequency estimation, mean estimation and machine learning are introduced thoroughly in Sections 3-5, respectively. Section 6 summarizes the wide application scenarios of LDP and Section 7 presents some future research directions. Finally, we conclude the paper in Section 8.

Theoretical Summarization of LDP
Formally, let N be the number of users, and U i (1 ≤ i ≤ N) denote the i-th user. Let V i denote the data record of U i , which is sampled from the attribute domain A that consists of d attributes A 1 , A 2 , · · · , A d . For categorical attribute, its discrete domain is denoted as K = {v 1 , v 2 , · · · , v k }, where k is the size of the domain and |K| = k. Notations commonly used in this paper are listed in Table 1.

LDP Model
Local differential privacy is a distributed variant of DP. It allows each user to report her/his value v locally and send the perturbed data to the server aggregator. Therefore, the aggregator will never access to the true data of each user, thus providing a strong protection. Here, user's value v acts as the input value of a perturbation mechanism and the perturbed data acts as the output value.

Definition
Definition 1 ( -Local Differential Privacy ([ -LDP) [25,42]). A randomized mechanism M satisfies -LDP if and only if for any pairs of input values v, v in the domain of M, and for any possible output y ∈ Y, it holds where P[·] denotes probability and is the privacy budget. A smaller means stronger privacy protection, and vice versa.
Sequential composition is a key theorem of LDP, which plays important roles in some complex LDP algorithms or some complex scenarios.
Theorem 1 (Sequential Composition). Let M i (v) be an i -LDP algorithm on an input value v, and M(v) is the sequential composition of M 1 (v), ..., M m (v). Then M(v) satisfies ∑ m i=1 i -LDP.

The Principle Method for Achieving LDP
Randomized response (RR) [43] is the classical technique for achieving LDP, which can also be used for achieving global DP [44]. The main idea of RR is to protect user's private information by answering a plausible response to the sensitive query. That is, one user who possesses a private bit x flips it with probability p to give the true answer and with probability 1 − p to give other answers.
For example, the data collector wants to count the true proportion f of the smoker among N users. Each user is required to answer the question "Are you a smoker?" with "Yes" or "No". To protect privacy, each user flips an unfair coin with the probability p being head and the probability 1 − p being tail. If the coin comes up to the head, the user will respond the true answer. Otherwise, the user will respond the opposite answer. In this way, the probabilities of answering "Yes" and "No" can be calculated as Then, we estimate the number of "Yes" and "No" that are denoted as N 1 and N − N 1 . From Equations (2) and (3), we have N 1 /N = f p + (1 − f )(1 − p) and (N − N 1 )/N = (1 − f )p + f (1 − p). Therefore, we can compute the estimated proportion of the smoker is Observe that in the above example the probability of receiving "Yes" varies from p to 1 − p depending on the true information of users. Similarly, the probability of receiving "No" also varies from p to 1 − p. Hence, the ratio of probabilities for different answers of one user can be at most p 1−p . By letting p 1−p = e and based on Equation (1), it can be easily verified that the above example satisfies (ln p 1−p )-LDP. To ensure (ln p 1−p ) > 0, we should make sure that p > 1 2 . Therefore, RR achieves LDP by providing plausible deniability for the responses of users. In this case, users no longer need to trust a centralized curator since they only report plausible data. Based on RR, there are plenty of other mechanisms for achieving LDP under different research tasks, which will be introduced in the following Sections.

Comparisons with Global Differential Privacy
We compare LDP with global DP from different perspectives, as shown in Table 2. At first, the biggest difference between LDP and DP is that LDP is a local privacy model with no assumption on the server while DP is a central privacy model with the assumption of a trusted server. Correspondingly, the general processing frameworks of DP and LDP are different. As shown in the left part of Figure 2, under the DP framework, the data are directly sent to the server and the noises are added to query mechanisms in the server-side. In contrast, under the LDP framework, each user's data are locally perturbed in the client-side before uploading to the server, as shown in the right part of Figure 2.   [25,42] Local No requirement Two records Randomized Response [42,43] The neighboring datasets in LDP are defined as two different records/values of the input domain. While in DP, the neighboring datasets are defined as two datasets that differ only in one record. For example, given a dataset, we can get its neighboring dataset by deleting/modifying one record. The most two common perturbation mechanisms of achieving DP are Laplace mechanism and Exponential mechanism [45,46] that inject random noises based on privacy budget and sensitivity. In contrast, the randomized response technique [42,43] is most commonly used to achieve LDP. As shown in Table 2, LDP holds the same sequential composition and post-processing properties as DP. Both DP and LDP are widely adopted by many applications, such as data collection, publishing, analysis, and so on.

LDP Model Settings
This section summarizes the model settings of LDP, which holds two paradigms, i.e., interactive setting and non-interactive setting [42,47].
Since LDP no longer assumes a trusted third-party data curator, the interactive and non-interactive privacy model settings of LDP [47] are different from that of DP [23]. Figure 3 shows the interactive and non-interactive settings of LDP.
Let v 1 , v 2 , · · · , v n ∈ K be the input sequences, and y 1 , y 2 , · · · , y n ∈ Y be the corresponding output sequences. As shown in left part of Figure 3, in interactive setting, the i-th output y i depends on the i-th input v i and the previous i − 1 outputs y 1:i−1 , but is independent of the previous i − 1 inputs v 1:i−1 . Particularly, the dependence and conditional independence correlations can be formally denoted as {v i , y 1 , · · · , y i−1 } → y i ∧ y i ⊥ v j |{v i , y 1 , · · · , y i−1 } for any j = i.
In contrast, as shown in the right part of Figure 3, the non-interactive setting is much simpler than interactive setting. The i-th output y i only depends on the i-th input v i . In formal, the dependence and conditional independence correlations can be denoted as v i → y i ∧ y i ⊥ {v j , y j , j = i}|v i . Therefore, the main difference between interactive and non-interactive settings of LDP is whether to consider the correlations between the output results. The work in [48,49] further investigated the power of interactivity in LDP.

Interactive
Non-interactive T w J B E J 3 D L 8 Q P U E u b j c T E i t z R S E l i Y 4 l R P h K 4 k L 1 l D j b s 7 V 1 2 9 0 j I h Z 9 g Y 6 E x t v 4 i O / + N C 1 y h 4 E s m e X l v J j P z g k R w b V z 3 2 y n s 7 O 7 t H x Q P S 0 f H J 6 f l y t l 5 R 8 e p Y t h m s Y h V L 6 A a B Z f Y N t w I 7 C U K a R Q I 7 A b T u 6 X f n a H S P J Z P Z p 6 g H 9 G x 5 C F n 1 F j p c T a s D y t V t + a u Q L a J l 5 M q 5 G g N K 1 + D U c z S C K V h g m r d 9 9 z E + B l V h j O B i 9 I g 1 Z h Q N q V j 7 F s q a Y T a z 1 a n L s i 1 V U Y k j J U t a c h K / T 2 R 0 U j r e R T Y z o i a i d 7 0 l u J / X j 8 1 Y c P P u E x S g 5 K t F 4 W p I C Y m y 7 / J i C t k R s w t o U x x e y t h E 6 o o M z a d k g 3 B 2 3 x 5 m 3 T q N c + t e Q 9 e t d n I 4 y j C J V z B D X h w C 0 2 4 h x a 0 g c E Y n u E V 3 h z h v D j v z s e 6 t e D k M x f w B 8 7 n D w Y 2 j Z I = < / l a t e x i t > T w J B E J 3 D L 8 Q P U E u b j c T E i t z R S E l i Y 4 l R P h K 4 k L 1 l D j b s 7 V 1 2 9 0 j I h Z 9 g Y 6 E x t v 4 i O / + N C 1 y h 4 E s m e X l v J j P z g k R w b V z 3 2 y n s 7 O 7 t H x Q P S 0 f H J 6 f l y t l 5 R 8 e p Y t h m s Y h V L 6 A a B Z f Y N t w I 7 C U K a R Q I 7 A b T u 6 X f n a H S P J Z P Z p 6 g H 9 G x 5 C F n 1 F j p c T a s D y t V t + a u Q L a J l 5 M q 5 G g N K 1 + D U c z S C K V h g m r d 9 9 z E + B l V h j O B i 9 I g 1 Z h Q N q V j 7 F s q a Y T a z 1 a n L s i 1 V U Y k j J U t a c h K / T 2 R 0 U j r e R T Y z o i a i d 7 0 l u J / X j 8 1 Y c P P u E x S g 5 K t F 4 W p I C Y m y 7 / J i C t k R s w t o U x x e y t h E 6 o o M z a d k g 3 B 2 3 x 5 m 3 T q N c + t e Q 9 e t d n I 4 y j C J V z B D X h w C 0 2 4 h x a 0 g c E Y n u E V 3 h z h v D j v z s e 6 t e D k M x f w B 8 7 n D w Y 2 j Z I = < / l a t e x i t >

The Framework of LDP Algorithm
The general privacy-preserving framework with LDP includes three modules: Randomization, Aggregation and Estimation, as shown in Algorithm 1. The randomization is conducted in the client side and both aggregation and estimation happen in the server side.

The Variants of LDP
Since the introduction of LDP, designing the variant of LDP was an important research direction to improve the utility of LDP and to make LDP more relevant in targeted IoT scenarios. This section summarizes the current research progresses on LDP variant, as shown in Table 3.
where δ is typically small.
Loosely speaking, ( , δ)-LDP means that a mechanism M achieves -LDP with probability at least 1 − δ. By relaxing -LDP, ( , δ)-LDP is more general since the latter in the special case of δ = 0 becomes the former.

BLENDER
BLENDER [52] is a hybrid model by combining global DP and LDP, which improves data utility with desired privacy guarantees. The BLENDER is achieved by separating the user pool into two groups based on their trust in the data aggregator. One is called opt-in group that contains the users who have higher trust in the aggregator. Another is called clients that contains the remaining users. Then, the BLENDER can maximize the data utility by balancing the data obtained from participation of opt-in users with that of other users. The privacy definition of BLENDER is the same as ( , δ)-DP [50].

Local d-Privacy
Geo-indistinguishability [53] was initially proposed for location privacy protection under global DP, which is defined based on the geographical distance of data. Geo-indistinguishability was quite successful when the statistics are distance-sensitive. In the local settings, Alvim et al. [54] also pointed out that the metric-based LDP can provide better utility that standard LDP. Therefore, based on d-privacy [55], Alvim et al. [54] proposed local d-privacy that is as defined as follows.
Local d-privacy can relax the privacy constraint by introducing a distance metric when d(v, v ) > 1, thus improving data utility. In other words, the relaxation of local d-privacy is reflected in that the two data becomes more distinguishable as their distance increases. Therefore, local d-privacy is quite appropriate for distance-sensitive data, such as location data, energy consumption in smart meters.

CLDP
LDP played an important role in data statistics and analysis. However, the standard LDP will suffer a poorly data utility when the number of users is small. To address this, Gursoy et al. [56] introduced condensed local differential privacy (CLDP) that is also a metric-based privacy notation. Let d(·, ·) be a distance metric. Then, CLDP is defined as follows.
By definition, in CLDP, α must decrease to compensate as distance d increases. Thus, it holds that α . In addition, Gursoy et al. [56] also adopted a variant of the Exponential Mechanism (EM) to design several protocols that achieve CLDP with better data utility when there is a small number of users.

PLDP
Instead setting a global privacy constraint for all users, personalized local differential privacy (PLDP) [7,57] is proposed to provide granular privacy constraints for each participating user. That is, under PLDP, each user can select the privacy demand (i.e., ) according to his/her own preference. In formal, PLDP is defined as follows.
where U is the privacy budget belonging to user U.
To achieve PLDP, Chen et al. [7] proposed personalized count estimation (PCE) protocol and further leveraged a user group clustering algorithm to apply PCE to users with different privacy level. In addition, Nie et al. [57] proposed the advanced combination strategy to compose multilevel privacy demand with an optimal utility.

ULDP
The standard LDP regards all user data equally sensitive, which leads to excessive perturbations. In fact, not all personal data are equally sensitive. For example, answer a questionnaires such as: "Are you a smoker?" Obviously, "Yes" is a sensitive answer, whereas "No" is not sensitive. To improve data utility, Utility-optimized LDP (ULDP) [58] was proposed as a new privacy notation to provide privacy guarantees only for sensitive data. In ULDP, let K S ⊆ K be the sensitive data set, and K N = K \ K S be the remaining data set. Let Y P ⊆ Y be the protected data set, and Y I = Y \ Y P be the invertible data set. Then, ULDP is formally defined as follows.
Definition 6 ((K S , Y P , )-ULDP). Given K S ⊆ K, Y P ⊆ Y, a randomized mechanism M provides (K S , Y P , )-PLDP if it satisfies the following properties: (i) For any y ∈ Y I , there exists an v ∈ X N such that (ii) For any v, v ∈ K and any y ∈ Y P , For an intuitive understanding for Definition 6, (K S , Y P , )-ULDP maps sensitive data v ∈ K S to only protected data set. Specifically, we can see from Formula (9) that no privacy protects are provided for non-sensitive data since each output in Y I reveals the corresponding input in K N . Also, we can also find from Formula (10) that (K S , Y P , )-ULDP provides the same privacy protections as -LDP for all sensitive data v ∈ K S .

ID-LDP
In ULDP, Murakami et al. [58] considered the sensitivity level of input data by directly separating the input data into sensitive data and non-sensitive data. However, Gu et al. [59] further indicated that different data have distinct sensitivity levels. Thus, they presented the Input-Discriminative LDP (ID-LDP) which is a more fine-grained version of LDP. The notion of ID-LDP is defined as follows.
where r(·, ·) is a function of two privacy budget.
It can be seen from Definition 7 that ID-LDP introduces the function r( v , v ) to quantify the indistinguishability between input values v and v that have different privacy levels with privacy budget v and v . The work in [58] mainly considers the minimum function between v and v and formalizes the MinID-LDP as follows.
That is, MinID-LDP always guarantees the worse-case privacy for the pair. Thus, MinID-LDP ensures better data utility by providing distinct protection for different inputs than standard LDP that provides the worse-case privacy for all data.

PBP
In addition, Takagi et al. [60] pointed that data providers can naturally choose and keep their privacy parameters secret since LDP perturbations occur in device side. Thus, they proposed a new privacy model Parameter Blending Privacy (PBP) as a generalization of standard LDP. PBP can not only keep the privacy parameters secret, but only improves the data utility through privacy amplification.
Let Θ be the domain of the privacy parameter. Given a privacy budget θ ∈ Θ, let P(θ) be the ratio of the number of times that θ is chosen to the number of users. Then, PBP is defined as follows.
Definition 9 (r-PBP). A randomized mechanism M satisfies r-PBP iff ∀θ ∈ Θ, v, v ∈ K, y ∈ Y, ∃θ ∈ Θ, it holds where the privacy function r() returns a real number that denotes the strength of privacy protection.
Comparisons and discussions. Table 3 briefly summaries the various LDP variants from different perspectives. With various purposes, these variants extend the standard LDP into more generalized or granular versions based on different design ideas. Meanwhile, the main protocols for achieving such LDP variants are also proposed. Nonetheless, there are still some issues in new privacy notions remaining unsolved. For example, PBP only focuses on the privacy parameters that are chosen at the user-level. In other words, the correlations between data and privacy parameters are neglected in PBP. Similarly, ULDP can't be directly applied to scenarios that sensitive data and non-sensitive are correlated. Besides, MinID-LDP considers the minimum function to decide the privacy budget. There might be other functions that can provide better data utility.

Frequency Estimation with LDP
This section summarizes the state-of-the-art LDP algorithms for frequency estimation. Frequency estimation, which is equivalent to histogram estimation, aims at computing the frequency of each given value v ∈ K, where |K| = k. Besides, we further subdivide the frequency-based task under LDP into several more specific tasks. In what follows, we will introduce each LDP protocol in the view of randomization, aggregation, and estimation, as described in Section 2.2.
Based on the Definition 1 of LDP, a more visual definition of LDP protocol [35] can be given as follows.
Definition 10 ( -LDP Protocol). Consider two probabilities p > q. A local protocol given by M such that a user reports the true value with p and reports each of other values with q, will satisfy -LDP if and only if it holds p ≤ q · e .
Based on the Theorem 2 in [35], the variance for the noisy number of the value v (i.e., N v ) among N users will be Var[ where f v is the frequency of the value v ∈ K. Thus, the variance is It can be seen that the variance of Equation (13) will be dominated by the first term when f v is small. Hence, the approximation of the variance in Equation (13) can be denoted as In addition, it holds that Var * = Var when p + q = 1.

General Frequency Estimation on Categorical Data
This section summarizes the general LDP protocols for frequency estimation on categorical data and shows the performance of each protocol. The encoding principle of the existing LDP protocols can be concluded as direct perturbation, unary encoding, hash encoding, transformation, and subset selection.

Direct Perturbation
The most basic building block for achieving LDP is direct perturbation that perturbs data directly by randomization.
Binary Randomized Response (BRR) [43,63] is the basic randomized response technique that focuses on binary values, i.e., the cardinality of value domain is 2. Section 2.1.2 introduced the basic randomized response technique that focuses on binary values. Based on this, BRR is formally defined as follows.
Randomization. Each value v is perturbed by Aggregation and Estimation. Let N v be the total number of received value v after aggregation.
The estimated frequency f v of value v can be computed as Observe that the probability that v * = v varies from 1 e +1 to e e +1 . The ratio of the respective probabilities for different values of v will be at most e . Therefore, BRR satisfies -LDP. Based on Equation (14), the variance of BRR is Generalized Randomized Response (GRR) [64,65] extends the BRR to the case where the cardinality of total values is more than 2, i.e., k > 2. GRR is also called Direct Encoding (DE) in [35] or k-RR in [64]. The process of GRR is given as follows.
Randomization. Each value v is perturbed by Aggregation and Estimation. Let N v be the total number of received value v after aggregation. The estimated frequency f v of value v can be computed as Observe that the probability that v * = v varies from 1 e +k−1 to e e +k−1 . The ratio of the respective probabilities for different values of v will be at most e . Therefore, GRR satisfies -LDP. Based on Equation (14), the variance of GRR is

Unary Encoding
Instead of perturbing the original value, we can perturb each bit of a vector that is generated by encoding the original value v. This method is called Unary Encoding (UE) [35] that is achieved as follows.
Randomization. UE encodes each value v ∈ K into a binary bit vector B with size k, where the v-th bit is 1, i.e., B = [0, · · · , 0, 1, 0, · · · , 0]. Each bit of B is perturbed by where p > q. Aggregation and Estimation. Assume the number of ones in the v-th bit among all original N vectors and all received N vectors are N 1 v andN 1 v , respectively. Based on Equation (20), we havē Then, the frequency f v of the value v is computed as Based on Equation (20), for any inputs v 1 ∈ K and v 2 ∈ K, and the output B * , it holds that where "≤" is achieved since the bit vectors differ only in positions v 1 and v 2 . There are four cases when choosing values for positions v 1 and v 2 . That is, It can be verified that a vector with position v 1 being 1 and position v 2 being 0 will maximize the ratio (i.e., the case 3 ).
Based on Equation (20), UE satisfies -LDP if and only if it follows that Therefore, letting the equal sign in Equation (24) hold, we can set p as follows: Applying Equation (25) to Equation (13), the variance of UE is Symmetric UE (SUE) [35] is the symmetric version of UE when choosing p and q such that p + q = 1. Based on this observation and Equation (25), we can derive p = e /2 e /2 +1 and q = 1 e /2 +1 . Then, the frequency can be computed based on Equation (21). The variance of SUE is Optimized UE (OUE) [35] is to minimize the Equation (26). By making the partial derivative of Equation (26) with respect to q equals to 0, we can get the formula By solving this, we can obtain The estimated frequency can be computed by Equation (21). By combining the Equations (26) and (28), the variance of OUE is

Hash Encoding
In the same way of UE, Basic RAPPOR [27] encodes each value v ∈ K into a length-k binary bit vector B and conducts Randomization with the following two steps.
Step 1: Permanent randomized response. Generate B 1 with the probability where r is a user-tunable parameter that controls the level of longitudinal privacy guarantee.
Step 2: Instantaneous randomized response. Perturb B 1 with the following probability distribution (i.e., UE) From the proof in [27], the Permanent randomized response (i.e., Step 1) achieves -LDP for = 2 ln 1−r/2 r/2 . The communication and computing cost of Basic RAPPOR is Θ(k) for each user, and Θ(Nk) for the aggregator. However, Basic RAPPOR does not scale to the cardinality k. [66] to encode each single element based on a set of m hash functions H = {H 1 , H 2 , · · · , H m }. Each hash function firstly outputs an integer in {0, 1, · · · , k − 1}. Then, each value v is encoded as a k-bit binary vector B by
From the proof in [27], RAPPOR achieves -LDP for = 2m ln 1−r/2 r/2 . Moreover, the communication cost of RAPPOR is Θ(k) for each user. However, the computation cost of the aggregator in RAPPOR is higher than Basic RAPPOR due to the LASSO regression.
O-RAPPOR [64] is proposed to address the problem of holding no prior knowledge about the attribute domain. Kairouz et al. [64] examined discrete distribution estimation when the open alphabets of categorical attributes are not enumerable in advance. They applied hash functions to map the underlying values at first. Then, the hashed values will be involved in a perturbation process, which is independent of the original values. On the basis of RAPPOR, Kairouz et al. adopted the idea of hash cohorts. Each user u i will be assigned to a cohort c i that is sampled i.i.d. from a uniform distribution over C = {1, · · · , C}. Each c ∈ C provides an independent view of the underlying distribution of strings. Based on hash cohorts, O-RAPPOR applies hash functions on a value v in cohort c before using RAPPOR and generates an independent h-bit hash Bloom filter BLOOM c for each cohort c, where the j-th bit of BLOOM c is 1 if H ASH c,h (v) = j for any h ∈ [1 · · · h]. Next, the perturbation on BLOOM c follows the same strategy in RAPPOR.
O-RR [64] is proposed to deal with non-binary attributes. It integrates hash cohorts into k-RR to deal with the case where the domain of attribute is unknown. Users in a cohort use their cohort hash function to project the value space into k disjoint subsets, i.e., x i = H ASH c (v) mod k = H ASH k c (v). Next, the O-RR perturbs the input value v as follows: Please note that Equation (33) contains a factor of C compared to Equation (17). This is because each value v belongs to one of the cohorts. The error bound of O-RR is the same as k-RR, but incurs more time cost due to hash and cohort operations.
To reduce communication and computation cost, local hashing (LH) [35] is proposed to hash the input value into a domain [g] such that g < k. Denote H as the universal hash function family. Each input value is hashed into a value in [g] by hash function H ∈ H. The universal property requires that Randomization. Given any input value v ∈ [k], LH first outputs a value x in [g] by hashing, i.e., x = H(v). Then, LH perturbs x with the following distribution After perturbation, each user sends H, y to the aggregator. Based on Equation (35), we can know that LH satisfies -LDP since it always holds that p ≤ qe .
Aggregation and Estimation. Assume we aim to estimate the frequency f v of the value v. The aggregator counts the total number that H, y supports value v, denoted as θ. That is, for each report H, y , if it holds that H(v) = y, then θ = θ + 1. Based on Equation (35), it holds that where p * is the probability of keeping unchanged of an input value and q * is the probability of flipping an input value. Then, while aggregating in the server, we have Based on Equations (36) and (37), we can get the estimated frequency of the value v, i.e., By taking p = p * , q = q * into Equation (13), the variance of LH is Local hashing will become Binary Local Hashing (BLH) [35] when g = 2. In BLH, each hash function H ∈ H hashes an input from [k] into one bit.
Randomization. Based on Equation (35), the randomization of BLH follows the probability distribution as Aggregation and Estimation. Based on LH, it holds that p * = p and q * = 1 2 p + 1 2 q = 1 2 . When the reported supports of value v is θ, based on Equation (38). the estimated frequency can be computed as The variance of BLH is Optimized LH (OLH) [35] aims to choose an optimized g to compromise the information losses between hash step and randomization step. Based on Equation (39), we can minimize the variance of LH by making the partial derivative of Equation (39) with respect to g equals to 0. That is, it is equivalent to solve the following equation (e − 1) 2 · g − (e − 1) 2 (e + 1) = 0. By solving it, the optimal g is g = e + 1, where g = e + 1 in practice. When the reported supports of value v is θ, based on Equation (38), the estimated frequency is f v = 2(gθ−N) N(e −1) . In addition, the variance of OLH is

Transformation
The transformation-based method is usually adopted to reduce the communication cost.
S-Hist [61] is proposed to produce a succinct histogram that contains the most frequent items (i.e., "heavy hitters"). Bassily and Smith [61] have proved that S-Hist achieves asymptotically optimal accuracy for succinct histogram estimation. S-Hist randomly selects only one bit from the encoded vector based on random matrix projection, which reduces the communication cost. The specific process of S-Hist is as follows, which includes an additional initialization step.
Initialization. The aggregator generates a random projection matrix }. The magnitude of each column vector in Φ is 1, and the inner product of any two different column vectors is 0. Here b is a constant parameter determined by error bound, where error is defined as the maximum distance between the estimated and true frequencies, i.e., max v∈K | f v − f v |.
Randomization. Assume the input value v is the v-th element of domain K. We encode v as Encode(v) = j, x , where j is chosen uniformly at random from [b], and x is the v-th element of the j-th row of Φ, i.e, x = Φ[j, v]. Then, we randomize x as follows: where c = e +1 e −1 . After perturbation, each user u i (i ∈ [1, N]) sends j i , z i to the aggregator. Aggregation and Estimation. Upon receiving the report j i , z i of each user u i , the estimation for the v-th element of K is computed by Based on Equation (44), it is easy to know that S-Hist satisfies -LDP for every choice of the index j.
Furthermore, Bassily and Smith [61] proved that the L ∞ -error of S-Hist is bounded by O 1 log(k/β) N with probability as least 1 − β.
Hadamard Randomized Response (HRR) [26,31,67] is a useful tool to handle sparsity by transforming the information contained in sparse vectors into a different orthonormal basis. HRR adopts Hadamard transformation (HT) to handle the situation where the inputs and marginals of individual users are sparse. HT is also called discrete Fourier transform, which is described by an orthogonal and symmetric matrix φ with dimension 2 k × 2 k . Each row/column in φ is denoted as φ i,j = 2 −k/2 (−1) i,j , where i, j denotes the number of 1's that i and j agree on in their binary representation. When a value v i is presented as a sparse binary vector B i , the full Hadamard transformation of the input is the B i -th column of φ, i.e., the Hadamard coefficient Randomization. User i samples an index j ∈ 2 k and perturbs φ B i ,j ∈ {−1, 1} by using BRR that keeps true value with probability p and flips the value with probability 1 − p. Then, the user i reports the perturbed coefficient o i and the index j to the aggregator. As we can see, the communication cost is O(log k + 1) = O(log k).
Aggregation and Estimation. Assume the observed sum of all received perturbed coefficient with index j is O j . Then, the unbiased estimation of the j-th Hadamard coefficient o j (with the 2 −k/2 factor scaled) is computed by In this way, the aggregator can compute the unbiased estimations of all coefficients and apply inverse transformation to produce the final frequency estimation f . Based on the proof in [31,68], the variance of HRR is By setting p = e e +1 to ensure LDP, the variance is 4e N(e −1) 2 . Thus, HRR provides a good compromise between accuracy and communication cost. Besides, the computation overhead in the aggregator is O(N + k log k), versus O(Nk) for OLH.
Furthermore, Jayadev et al. [67] designed a general family of LDP schemes. Based on Hadamard matrices, they choose the optimal privatization scheme from the family for high privacy with less communication cost and higher efficiency.

Subset Selection
The main idea of subset selection is randomly select ω items from the domain K. ω-Subset Mechanism (ω-SM) [69,70] is proposed to randomly reports a subset z with size ω of the original attribute domain K, i.e., z ⊆ K. Essentially, the output space z is the power set of the data domain K. In addition the conditional probabilities of any input v ∈ K, output z ⊆ K are as follows: As we can see, when ω = 1, the 1-SM is equivalent to generalized randomized response (GRR) mechanism.
Randomization. Based on Equation (48), the randomization procedure of ω-SM is shown in Algorithm 2. Observe that the core part of randomization is randomly sampling ω − 1 or ω elements from K − {v} without replacement. Aggregation and Estimation. Denote f i andf i as the real and the received frequency of the i-th value v i , respectively. Upon receiving a private view z i , it will increasef i for each v i ∈ z i . Based on Algorithm 2, we can know that the true positive rate θ tr is θ tr = ωe ωe +k−ω , which is the probability of v i staying unchanged when the input value is v i . In addition the false positive rate θ f r is θ f r = ωe ωe +k−ω · ω−1 k−1 + k−ω ωe +k−ω · ω k−1 , which is the probability of v i showing in the private view when the input value is v i (i = i ). Therefore, the expectation off i is E[ Thus, we can get the estimated frequency of f i is Comparisons. Table 4 summarizes the general LDP protocols from the perspective of encoding principles. The error bound is measured by L ∞ -norm. BRR and GRR are direct perturbation-based methods, which are suitable for low-dimensional data.
Discussions. Section 3.1 discusses the general frequency estimation protocols with LDP. When focusing on multiple attributes (i.e., d-dimensional data), we can directly use the above protocols to estimate the frequency of each attribute. We can also estimate joint frequency distributions of multiple attributes by using the above protocols as long as we regard the Cartesian product of the values of multiple attributes as the total domain. However, the error bound will be d times greater than dealing with a single attribute. Even worse is that the total domain cardinality will increase exponentially with the dimension d, which leads to huge computation overhead and low data utility. Therefore, several studies [31,[71][72][73][74][75] have investigated on estimating joint probability distributions of d-dimensional data, which are summarized in Section 3.6.

Frequency Estimation on Set-Valued Data
This section summarizes the mechanisms for frequency estimation on set-valued data, including items distribution estimation, frequent items mining, and frequent itemsets mining.
Set-valued data denotes a set of items. Let K = {v 1 , v 2 , · · · , v k } be the domain of items. Set-valued data V i of user i is denoted as a subset of K, i.e., V i ⊆ K. Different users may have different number of items. Table 5 shows a set-valued dataset of six users with item domain K = {A, B, C, D, E, F}. In what follows, we will introduce each frequency-based task on set-valued data. Table 5. Example of set-valued dataset.

Item Distribution Estimation
The basic frequency estimation task on set-valued data is to analyze the distributions over k items. For an item v ∈ K, its frequency is defined as the fraction of times that v occurs. That is, Let c v be the number of users whose data include v. Then, we have c v := |{V i |v ∈ V i }| and f v = 1 N c v . To tackle set-valued data with LDP, there are two tough challenges [76]. (i) Huge domain: supposing there are total k items and each user has at most l items, then the number of possible combinations of items for each user could reach to ( k l ) + ( k l−1 ) + · · · + ( k 0 ). (ii) Heterogeneous size: different users may have different numbers of items, varying from 0 to l. To address heterogeneous size, one of the most general methods is to add the padding items to the below-sized record. Then, the naive method is to treat the set-valued data as categorical data and use LDP algorithms in Section 3.1. However, this method needs to divide the privacy budget into smaller parts, thereby introducing excessive noise and reducing data utility.
Wang et al. [76] proposed PrivSet mechanism which has a linear computational overhead with respect to the item domain size. PrivSet pre-processed the number of items of each user to l, which addresses the issue of heterogeneous size. When the item size is beyond l, it simply truncates or randomly samples l items from the original set. When the item size is under l, it adds padding items to the original set. After pre-processing, each padded set-valued data s belongs to S = {b|b ⊆ K and |b| = l}, where K is the item domain after padding. Then, PrivSet randomized data based on the exponential mechanism. For each padded data s ∈ S, the randomization component of PrivSet selects and outputs an element t ∈ T = {a|a ⊆ K and |a| = l } with probability where Ω is the probability normalizer and equals to ( k l ) + exp( ) · ( k+l l ) − ( k l ) . As analyzed in [76], the randomization of PrivSet reduces the computation cost from O ( k+l l ) to O(k), which is linear to item domain size. PrivSet also holds a lower error bound over other mechanisms.
Moreover, LDPart [77] is proposed to generate sanitized location-record data with LDP, where location-record data are treated as a special case of set-valued data. LDPart uses a partition tree to greatly reduce the domain space and leverages OUE [35] to perturb the input values. However, the utility of LDPart quite relies on two parameters, i.e., the counting threshold and the maximum length of the record. It's very difficult to calculate the optimal values of these two parameters.

Frequent Items Mining
Frequent items mining (also known as heavy hitters identification, top-ω hitters mining, or frequent terms discovery) has played important roles in data statistics. Based on the notations in Section 3.2.1, we say an item v is ω-heavy (ω-frequent) if its multiplicity is at least ω, i.e., c v ≥ ω. The task of frequent items mining is to identify all ω-heavy hitters from the collected data. For example, as shown in Table 5, the 3-heavy items are A, D, and E.
Bassily and Smith [61] focused on producing a succinct histogram that contains the most frequent items of the data under LDP. They leveraged the random matrix projection to achieve much lower communication cost and error bound than that of earlier methods in [78,79]. The specific process of S-Hist is introduced in Section 3.1. Furthermore, a follow-up work in [80] proposed TreeHist that computes heavy hitters from a large domain. TreeHist transforms the user's value into a binary string and constructs a binary prefix tree to compute frequent strings, which improves both efficiency and accuracy. Moreover, with strong theoretical analysis, Bun et al. [81] proposed a new heavy hitter mining algorithm that achieves the optimal worst-case error as a function of the domain size, the user number, the privacy budget, and the failure probability.
To address the challenge that the number of items in each user record is different, Qin et al. [30] proposed a Padding-and-Sampling frequency oracle (PSFO) that first pads user's items into a uniform length l by adding some dummy items and then makes each user sample one item from possessed items with the same sampling rate. They designed LDPMiner based on RAPPOR [27] and S-Hist [61]. LDPMiner adopts a two-phase strategy with privacy budgets 1 and 2 , respectively. In phase 1, LDPMiner identifies the potential candidate set of frequent items (i.e., the top-ω frequent items) by using a randomized protocol with 1 . The aggregator broadcasts the candidate set to all users. In phase 2, LDPMiner refines the frequent items from the candidates with the remaining privacy budget 2 and outputs the frequencies of the final frequent items. LDPMiner is much wiser on budget allocation than naive method. However, LDPMiner still needs to split the privacy budget into 2l parts at both phases, which limits the data utility.
Wang et al. [75] proposed a prefix extending method (PEM) to discover heavy hitters from an extremely large domains (e.g., k = 2 128 ). To address the computing challenge, PEM iteratively identifies the increasingly longer frequent prefixes based on a binary prefix tree. Specifically, PEM first divides users into g equal-size groups, making each group i (1 ≤ i ≤ g) associate with a particular prefix length s i , such that 1 < s 1 < s i < · · · < s g = log k . Then each user reports the private value using LDP protocol and the server iterates through the groups. Obviously, group size g is a key parameter that influences both the computation complexity and data utility. Therefore, Wang et al. [75] further designed a sensitivity threshold principle that computes a threshold to control the false positives, thus maintaining the effectiveness and accuracy of PEM. Jia et al. [82] have pointed out that prior knowledge can be used to improve the data utility of the LDP algorithms. Thus, Calibrate was designed to incorporate the prior knowledge via statistical inference, which can be appended to the existing LDP algorithms to reduce estimation errors and improve the data utility.

Frequent Itemset Mining
Frequent itemset mining is much similar to frequent items mining, except that the desired results of the former will become a set of itemsets rather than items. Frequent itemsets mining is much more challenging since the domain size of itemsets is exponentially increased.
Following the definition of frequent item mining in Section 3.2.2, the frequency of any itemset v ⊆ K is defined as the fraction of times that itemset v occurs. That is, f v := 1 N |{V i |v ⊆ V i }|. The count of any itemset v ⊆ K is defined as the total number of users whose data include v as a subset. That is, c v := |{V i |v ⊆ V i }|. An ω-heavy itemset v is such that its multiplicity is at least ω, i.e., c v ≥ ω. The task of frequent itemset mining is to identify all ω-heavy itemsets from the collected data. For example, the 3-heavy itemsets in Table 5  Sun et al. [83] proposed a personalized frequent itemset mining algorithm that provides different privacy levels for different items under LDP. They leveraged the randomized response technique [43] to distort the original data with personalized privacy parameters and reconstructed itemset supports from the distorted data. This method distorted each item in domain separately, which leads to an ) that is super-linear to k. As introduced in Section 3.2.2, LDPMiner mines frequent items over set-valued data by using a PSFO protocol. Inspired by LDPMiner, Wang et al. [84] also padded users items into size l and sampled one item from the possessed items of each user, which could ensure the frequent items can be reported with high probability even though there still exist unsampled items. Specifically, Wang et al. [84] designed a Set-Value Item Mining (SVIM) protocol, and then based on the results from SVIM, they proposed a more efficient Set-Value ItemSet Mining (SVSM) protocol to find frequent itemsets. To further improve the data utility of SVSM, they also investigated the best-performing-based LDP protocol for each usage of PSFO by identifying the privacy amplification property of each LDP protocol.

New Terms Discovery
This section introduces the task of new terms discovery that focuses on the situation where the global knowledge of item domain is unknown. Discovering top-ω frequent new terms is an important problem for updating a user-friendly mobile operating system by suggesting words based on a dictionary. Apple iOS [26,85], macOS [86], and Google Chrome [27,87] have integrated with LDP to protect users privacy when collecting and analyzing data. RAPPOR [27] is first used for frequency estimation under LDP, which is introduced in Section 3.1. Afterward, its augmented version A-RAPPOR [87] is proposed and applied in the Google Chrome browser for discovering frequent new terms. A-RAPPOR reduces the huge domain of possible new terms by collecting n-grams instead of full terms. Suppose the character domain is C. Then there will be C/n such n-gram groups. For each group, A-RAPPOR constructs the significant n-grams that will be used to construct a m-partite graph, where m = C/n. Thus, the frequent new terms can be found efficiently by finding m-cliques in a m-partite graph. However, A-RAPPOR has rather low utility since the n-grams cannot always represent the real terms. The variance of each group is limited as 4e N(e −1) 2 . To improve the accuracy and reduce the huge computational cost, Wang et al. [88] proposed PrivTrie which leverages an LDP-complaint algorithm to iteratively construct a trie. When constructing a trie, the naive method is to uniformly allocate the privacy budget to each level of the trie. However, this will lead to inaccurate frequency estimations, especially when the height of the trie is large. To address this challenge, PrivTrie only requires to estimate a coarse-grained frequency for each prefix based on an adaptive user grouping strategy, thereby remaining more privacy budget for actual terms. PrivTrie further enforces consistency in the estimated values to refine the noisy estimations. Therefore, PrivTrie achieved much higher accuracy and outperformed the previous mechanisms. Besides, Kim et al. [89] proposed a novel algorithm called CCE (Circular Chain Encoding) to discover new words from keystroke data under LDP. CCE leveraged the chain rule of n-grams and a fingerprint-based filtering process to improve computational efficiency.
Comparisons and discussions. Table 6 shows the comparisons of LDP-based protocols for frequency estimation on set-valued data, including communication cost, key technique, and whether need to know the domain in advance. As for set-valued data, the padding-and-sampling technique is always adopted to solve the problem of the heterogeneous item size of different users, such as in [30,76,84]. The padding size l is a key parameter for both efficiency and accuracy. How to choose an optimal l needs further study. Besides, tree-based method is also widely used in [75,77,80,88] to reconstruct set-valued data. The tree-based method usually requires partition users into different groups or partition privacy budget for each level of the tree, which limits the data utility. In this case, the optimal budget allocation strategy needs to be designed. Meanwhile, the adaptive user grouping technique is also a better way to improve data utility, such as in [88].

Frequency Estimation on Key-Value Data
Key-value data [90] is such data that has a key-value pair including a key and a value, which is commonly used in big data analysis. For example, the key-value pairs (KV pair) that denote diseases and their diagnosis values are listed as { Cancer, 0.3 }, { Fever, 0.06 }, { Flu, 0.6 }, etc. While collecting and analyzing key-value data, there are four challenges to consider. (i) Key-value data contain two heterogeneous dimensions. The existing studies mostly focus on homogeneous data. (ii) There are inherent correlations between keys and values. The naive method that deals with key-value data by separately estimating the frequency of key and the mean of value under LDP will lead to the poor utility. (iii) One user may possess multiple key-value pairs that need to consume more privacy budget, resulting in larger noise. (iv) The overall correlated perturbation mechanism on key-value data should consume less privacy budget than two independent perturbation mechanisms for key and value respectively. We can improve data utility by computing the actually consumed privacy budget.
Ye et al. [90] proposed PrivKV that retains the correlations between keys and values while achieving LDP. PrivKV adopts Harmony [91] to perturb the value v of a KV pair into v * with privacy budget 2 and converts the pair key, v * into canonical form 1, v * that is perturbed by Noted that in PrivKV, the value of a key-value pair is randomly drawn from the domain of [−1, 1] when users do not own a key-value pair (i.e., the users have no apriori knowledge about the distribution of true values). Thus, PrivKV suffers from low accuracy and instability. Therefore, Ye et al. [90] built two algorithms PrivKV M and PrivKV M + by multiple iterations to address this problem. Intuitively, as the number of iterations increases, the accuracy will be improved since the distribution of the values will be close to the distributions of the true values.
Based on PrivKV, Sun et al. [92] proposed several mechanisms for key-value data collection based on direct encoding and unary encoding techniques [35]. Sun et al. [92] also introduced conditional analysis for key-value data for the first time. Specifically, they proposed several mechanisms that support L-way conditional frequency and mean estimation while ensuring good accuracy.
However, the studies in [90,92] lack exact considerations on challenges (iii) and (iv) mentioned previously. On the one hand, they simply sample a pair when facing multiple key-value data pairs, which cannot make full use of the whole data pairs and may not work well for a large domain. On the other hand, they neglect the privacy budget composition when considering the inherent correlations of key-value data, thus leading to limited data utility.
Gu et al. [93] proposed the correlated key/value perturbation mechanism which reduces the privacy budget consumption and enhances data utility. They designed a Padding-and-Sampling protocol for key-value data to deal with multiple pairs of each user. Thus, it is no longer necessary to sample a pair from the whole domain (e.g., PrivKVM [90]), but sample from the key-value pairs possessed by users, thus eliminating the affects of large domain size. Then, they proposed two protocols PCKV-UE by adopting unary encoding and PCKV-GRR by adopting the generalized randomized response. Rather than sequential composition, both PCKV-UE and PCKV-GRR involve a near-optimal correlated budget composition strategy, thereby minimizing the combined mean square error.
Comparisons and discussions. Table 7 shows the comparisons of frequency/mean estimations on key-value data with LDP. To solve the challenge of each user has multiple data pairs, simple sampling [90,92] is adopted to sample a pair from the whole domain and padding-and sampling [93] is adopted to sample a pair from the key-value pairs possessed by users. Besides, PrivKV M [90] and PCKV-UE/PCKV-GRR [93] consider the correlations between key and value by iteration and correlated perturbation, respectively. However, CondiFre [92] lacks of the consideration on learning correlations when conducting conditional analysis. Furthermore, both PrivKV [90] and CondiFre [92] achieve LDP based on two independent perturbations with fixed privacy budget by sequential composition. PCKV-UE/PCKV-GRR [93] holds a tighter privacy budget composition strategy that makes the optimal allocation of privacy budget.

Frequency Estimation on Ordinal Data
Compared to categorical data, ordinal data has a linear ordering among categories, which is concluded as ordered categorical data, discrete numerical data (e.g., discrete sensor/metering data), and preference ranking data makes the optimal allocation of privacy budget.
When quantifying the indistinguishability of two ordinal data with LDP, the work in [53] measured the distance of two ordinal data by -geo-indistinguishability. That is, a mechanism satisfies -DP if it holds P[M(X i ) ∈ Y ] ≤ e ·d(X i ,X j ) · P[M(X j ) ∈ Y ] for any possible pairs X i , X j ∈ X . The distance d(X i , X j ) of ordinal data X i and X j can be measured by Manhattan distance or squared Euclidean distance. Based on -geo-indistinguishability, Wang et al. [70,94] proposed subset exponential mechanism (SEM) that is realized by a tweaked version of exponential mechanism [22]. Besides, they also proposed a circling subset exponential mechanism (CSEM) for ordinal data with uniform topology. For both SEM and CSEM, the authors have provided the theoretical error bounds to show their mechanisms can reduce nearly a fraction of exp(− 2 ) error for frequency estimation.
Preference ranking data is also one of the most common representations of personal data and highly sensitive in some applications, such as preference rankings on political or service quality. Essentially, preference ranking data can be regarded as categorical data that hold an order among different items. Given an item set K = {v 1 , v 2 , · · · , v k }, a preference ranking of K is an ordered list that contains all k items in K. Denote a preference ranking as σ = σ(1), σ(2), · · · , σ(k), , where σ(j) = v i means that item v i 's rank under σ is j. The goal of collecting preference rankings is to estimate the distribution of all different rankings from N users. It can easily verify that the domain of all rankings is k!, which leads to excessive noises and low accuracy when k is large. Yang et al. [95] proposed SAFARI that approximates the overall distributions over a smaller domain that is chosen based on a riffle independent model. SAFARI greatly reduces the noise amount and improves data utility.
Voting data is to some extent a kind of preference ranking. By aggregating the preference rankings based on one of the certain positional voting rules (e.g., Borda, Nauru, Plurality [96][97][98]), we can obtain the collective decision makings. To avoid leaking personal preferences in a voting system, the work in [99] collected and aggregated voting data with LDP while ensuring the usefulness and soundness. Specifically, weighted sampling mechanism and additive mechanism are proposed for LDP-based voting aggregation under general positional voting rules. Compared to the naïve Laplace mechanism, weighted sampling mechanism and additive mechanism can reduce the maximum magnitude risk bound from +∞ to O( k 3 N ) and O( k 2 N ), respectively, where k is the size of vote candidates (i.e., the domain size), N is the number of users.
As one of the fundamental data analysis primitives, range query aims at estimating the fractions or quantiles of the data within a specified interval [100,101], which is also an analysis task on ordinal data. The studies in [68,102] have proposed some approaches to support range queries with LDP while ensuring good accuracy. They designed two methods to describe and analyze the range queries based on hierarchical histograms and the Haar wavelet transform, respectively. Both methods use OLH [35] to achieve LDP with low communication cost and high accuracy. Besides, local d-privacy is a generalized notion of LDP under distance metric, which is adopted to assign different perturbation probabilities for different inputs based on the distance metrics. Gu et al. [103] used local d-privacy to support both range queries and frequency estimation. They proposed an optimization framework by solving the linear equations of perturbation probabilities rather than solving an optimization problem directly, which not only reduces computation cost but also makes the optimization problem always solvable when using d-privacy.

Frequency Estimation on Numeric Data
Most existing studies compute frequency estimations on categorical data. However, there are many numerical attributes in nature, such as income, age. Computing frequency estimation of numeric data also plays important role in reality.
For numerical distribution estimation with LDP, the naïve method is to discretize the numerical domain and apply the general LDP protocols directly. However, the data utility of the naïve method relies heavily on the granularity of discretization. Even worse is that an optimal discretization strategy depends on privacy parameters and the original distributions of the numeric attributes. Thus, it is a big challenge to find the optimal discretization strategy. Li et al. [104] used the ordered nature of the numerical domain to compromise a better trade-off between privacy and utility. They proposed a novel mechanism based on expectation-maximization and smoothing techniques, which improves the data utility significantly.

Marginal Release on Multi-Dimensional Data
The marginal table is "the workhorse of data analysis" [31]. When obtaining marginal tables of a set of attributes, we can learn the underlying distributions of multiple attributes, identify the correlated attributes, describe the probabilistic relationships between cause and effects. Thus, k-way marginal release was widely investigated with LDP [31,72,73].
Denote A = {A 1 , A 2 , · · · , A d } as the d attributes of d-dimensional data. For each attribute A j (j = 1, 2, · · · , d), the domain of A j is denoted as where ω i j is the i-th value of Ω j and |Ω j | is the cardinality of Ω j . The marginal table is defined as follows. Table [ 31]). Given d-dimensional data, marginal operator C β computes all frequencies of different attribute combinations that are decided by β ∈ {0, 1} d , where |β| denotes the number of 1 s in β, and |β| = k ≤ d. The marginal table contains all the returned results of C β . Example 1. When d = 4 and β = 0110, it means that we estimate the probability distributions of all combination of the second and third attributes. C 0110 returns the frequency distributions of all combinations.

Definition 11 (Marginal
The k-way marginal is the probability distributions of any k attributes in d attributes. Definition 12 (k-way Marginal [31]). The k-way marginal is the probability distributions of k attributes in d attributes, i.e., |β| = k. For a fixed k, the set of all possible k-way marginals correspond to all ( d k ) distinct ways of picking k attributes from d, which called full k-way marginals.
The k-way marginal release is to estimate k-way marginal probability distribution for any k attributes A j 1 , A j 2 , · · · , A j k chosen from A. The k-way marginal distribution of attributes A j 1 , A j 2 , · · · , A j k is denoted as P(A j 1 A j 2 · · · A j k ). It has for ∀ω j 1 ∈ Ω j 1 , ω j 2 ∈ Ω j 2 , · · · , ω j k ∈ Ω j k .

k-Way Marginal Probability Distribution Estimation
Randomized response technique [27] can be naïvely leveraged to achieve LDP when computing k-way marginal probability distributions. However, both the efficiency and accuracy will be seriously affected by the "curse of high-dimensionality". The total domain cardinality will be ∏ k j=1 |Ω j |, which increases exponentially as k increases.
The EM-based algorithm with LDP [87] is restricted to 2-way marginals. When k is large, it will lead to high time/space overheads. The work in [71] proposed a Lasso-based regression mechanism that can estimate high-dimensional marginals efficiently by extracting key features with high probabilities. Besides, Ren et al. [72] proposed LoPub to find compactly correlated attributes to achieve dimensionality reduction, which further reduces the time overhead and improves data utility.
Nonetheless, the k-way marginals release still suffers from low data utility and high computational overhead when k becomes larger. To solve this, the work in [73] proposed to leverage Copula theory to synthesize multi-dimensional data with respect to marginal distributions and attribute dependence structure. It only needs to estimate one and two-marginal distributions instead of k-way marginals, thus circumventing the exponential growth of domain cardinality and avoiding the curse of dimensionality. Afterward, Wang et al. [105] further leveraged C-vine Copula to take the conditional dependencies among high-dimensional attributes into account, which significantly improves data utility.
Cormode et al. [31] have investigated marginal release under under different kinds of LDP protocols. They further proposed to materialize marginals by a collection of coefficients based on the Hadamard transform (HT) technique. The underlying motivation of using HT is that the computation of k-way marginals require only a few coefficients in the Fourier domain. Thus, this method improves the accuracy and reduces the communication cost. Nonetheless, this method is designed for binary attributes. The non-binary attributes need to be pre-processed to binary types, leading to higher dimensions. To further improve the accuracy, Zhang et al. [74] proposed a consistent adaptive local marginal (CALM) algorithm. CALM is inspired by PriView [106] that builds k-way marginal by taking the form of m marginals each of the size l (i.e., synopsis). Besides, the work in [107,108] focused on answering multi-dimensional analytical queries that are essentially formalized as computing k-way marginals with LDP.
Comparisons and discussions. Table 8 summarized the LDP-based algorithms for k-way marginal release. To improve efficiency, the existing methods try to reduce the large domain space by various techniques, such as HT and dimensionality reduction. As we can see, the variances of the existing methods are relatively large, leading to limited data utility. Although the subset selection is a useful way to reduce the communication cost and variance, it suffers from the sampling error when constructing low-dimensional synopsis. Therefore, designing mechanisms with high data utility and low costs still faces big challenges when d is large.
Var is the variance of estimating a single cell in the full contingency table; 2 l is the size of m low marginals of dataset.

Conditional Probability Distribution Estimation
The conditional probability is also important for statistics. Sun et al. [92] have investigated on conditional distribution estimation for the keys in key-value data. They formalized k-way conditional frequency estimation and applied the advanced LDP protocols to compute k-way conditional distributions. Besides, Xue et al. [109] proposed to compute the conditional probability based on k-way marginals and further train a Bayes classifier.

Frequency Estimation on Evolving Data
So far, most academic literature focuses on frequency estimation for one-time computation with LDP. However, privacy leaks will gradually accumulate as time continues to grow under centralized DP [110][111][112], so does LDP [28,113]. Therefore, when applying LDP for dynamic statistics over time, an LDP-compliant method should take time factor into account. Otherwise, the mechanism is actually difficult to achieve the expected privacy protection over long time scales. For example, Tang et al. [86] have pointed out that the privacy parameters provided by Apple's implementation on MacOS will actually become unreasonably large even in relatively short time periods. Therefore, it requires careful considerations for longitudinal attacks on evolving data.
Erlingsson et al. [27] adopted a heuristic memoization technique to provide longitudinal privacy protection in the case that multiple records are collected from the same user. Their method includes Permanent randomized response and Instantaneous randomized response. These two procedures will be performed in sequence with a memoization step in between. The Permanent randomized response outputs a perturbed answer which is reused as the real answer. The Instantaneous randomized response reports the perturbed answer over time, which prevents possible tracking externalities. In particular, the longitudinal privacy protection in work [27] assumes that the user value does not change over time, such as the continual observation model [114]. Thus, the approach in [27] cannot guarantee strong privacy for the users who have numeric values with frequent changes.
Inspired by [27], Ding et al. [28] used the permanent memoization for continual counter data collection and histogram estimation. They first designed ω-bit mechanism ωBitFlip to estimate frequency of counter values in a discrete domain with k buckets. In ωBitFlip, each user randomly draws ω bucket numbers without replacement from [k], denoted as j 1 , j 2 , · · · , j ω . At each time t, each user randomizes her data v t ∈ [k] and reports a vector b t = [(j 1 , b t (j 1 )), (j 2 , b t (j 2 )), · · · , (j ω , b t (j ω ))], where b t (j z ) (z = 1, 2, · · · , ω) is a random 0/1 bit with Assume the sum of received 1 bit isN v t . Then, the estimated frequency of v t ∈ [k] is As we can see, ωBitFlip will be the same as the one in Duchi et al. [115] when ω = k. Ding et al. [28] have proved that ωBitFlip holds an error bound of O In naïve memoization, each user reports a perturbed value based on the mapping f k : [k] → {0, 1} k , which leads to privacy leakage. To tackle this, Ding et al. [28] proposed ω-bit permanent memoization mechanism ωBitFlipPM based on ωBitFlip. ωBitFlipPM reports each response in a mapping f ω : [k] → {0, 1} ω , which avoids the privacy leakage since multiple buckets are mapped to the same response.
Moreover, Joseph et al. [113] proposed a novel LDP-compliant mechanism THRESH for collecting up-to-date statistics over time. The key idea of THRESH is to update the global estimation only when it might become sufficiently inaccurate. To identify these update-needed epochs, Joseph et al. designed a voting protocol that requires users to privately report a vote for whether they believe the global estimation needs to be updated. The THRESH mechanism can ensure that the privacy guarantees only degrade with the number of times of the statistics changes, rather than the number of times the computation of the statistics. Therefore, it can achieve strong privacy protection for frequency estimation over time while ensuring good accuracy.

Mean Value Estimation with LDP
This Section summarizes the task of mean value estimation for numeric data with LDP, including mean value estimation on numeric data and mean value estimation on evolving data.
In formal, let D = {V 1 , V 2 , · · · , V N } be the data of all users, where N is the number of users.
) denotes the data of the i-th user, which consists of d numeric attributes A 1 , A 2 , · · · , A d . Each v i j (j ∈ [1, d]) denotes the value of the j-th attribute of the i-th user. Without loss of generality, the domain of each numeric attribute is normalized into [−1, 1]. The mean estimation is to estimate the mean value of each attribute A j (j ∈ [1, d]) over N users, i.e.,

Mean Value Estimation on Numeric Data
be the perturbed d-dimensional data of user i. Given a perturbation mechanism M, we use E[ v j ] to denote the expectation of the output v j given an input v j . Therefore, to achieve LDP, a perturbation mechanism should satisfy the following two constraints, that is, The first constraint (i.e., Equation (55)) shows that the mechanism should be unbiased. The second constraint (i.e., Equation (56)) shows that the sum of probabilities of the outputs must be one, where V is the output range of M.
Laplace mechanism [45] under DP can be applied in a distributed manner to achieve LDP. Based on Laplace mechanism, each user's data will be perturbed by adding randomized Laplace noise, i.e., where Lap(λ) is a random variable drawn from a Laplace distribution with the probability density function of pd f (v) = 1 2λ exp − |v| λ . Please note that ∆ is the sensitivity and ∆ = 2 since each numeric data lies in range [−1, 1]. The privacy budget for each dimension is /d. Then, the aggregator will compute the average value of all received noisy reports as 1 ). As we can see, the Laplace mechanism will incur excessive error when d becomes large.
Duchi et al. [115] proposed an LDP-compliant method for collecting multi-dimensional numeric data. The basic idea of Duchi et al.'s method is to use a randomized response technique to perturb each user's data according to a certain probability distribution while ensuring an unbiased estimation. Each user's tuple V i ∈ [−1, 1] d will be perturbed into a noisy vector V i ∈ {−B, B} d , where B is a constant decided by d and . According to [115], B is computed as where Duchi et al.'s method takes a tuple V i ∈ [−1, 1] d as inputs and discretizes the d-dimensional data into X := [X 1 , X 2 , · · · , X d ] ∈ {−1, 1} d by sampling each X j independently from the following distribution.
In the case of X is sampled, let T + (resp. T − ) be the set of all tuples V i ∈ {−B, B} d such that V i · X > 0 (resp. V i · X ≤ 0). The algorithm will return a noisy value based on the value of a Bernoulli variable u. That is, it will return V i uniformly at random from T + with probability of P Furthermore, Nguyên et al. proposed Harmony [91] that is simpler than Duchi et al.'s method when collecting multi-dimensional data with LDP, but achieves the same privacy guarantee and asymptotic error bound. Given an input V i , Harmony returns a perturbed tuple V i which has non-zero value on only one dimension j ∈ [1, d]. That is, Harmony uniformly at random samples only one dimension j from [1, d] and returns a noisy value v i j that is generated from the following distribution , if x = − e +1 e −1 · d.
As we can see, in Harmony, each user only needs to report one bit to the aggregator. Thus, Harmony has a lower communication overhead of O(1) than Duchi et al.'s method, but holds the same Wang et al. [29] further proposed piecewise mechanism (PM) that has lower variance and is easier to implement than Duchi et al.'s method.
We first introduce PM for one dimensional data, i.e., d = where (62), PM samples a value u uniformly at random from [0, 1] and returns v i uniformly at random from e /2 +1 ≤ u ≤ 1). Wang et al. [29] have proved that the variance of PM for one-dimensional data is 4e /2 3(e /2 −1) 2 . Recall that the variance of Duchi et al.'s method for one-dimensional data is (e +1) 2 (e −1) 2 . It can be verified that the variance of PM will smaller than that of Duchi et al.'s method when > 1.29.
Furthermore, Wang et al. [29] extended the PM for collecting multi-dimensional data based on the idea of Harmony [91]. Given an input tuple V i ∈ [−1, 1] d , it returns a perturbed V i that has non-zero value on k dimensions at most, where m = max{1, min{d, 2.5 }}. In this way, the PM for multi-dimensional data has a error bound of O √ d log d √ N . In particular, m is much smaller than d and equals to 1 for < 5. Comparisons and discussions. Table 9 summarizes of LDP algorithms for mean estimation on multi-dimensional numeric data. As we can see, Laplace [45]  , where k is much smaller than d and will be 1 when The last column of Table 9 shows the variance of each mechanism for one-dimensional data. It can be verified that the variance of PM is always smaller than Laplace, but slightly worse than Duchi et al.'s method and Harmony when < 1.29. By observing this, Wang et al. [29] proposed to combine PM and Duchi et al.'s method into a new Hybrid Mechanism (HM). They have proved that the worst-case variance of HM for d = 1 is 3e /2 (e /2 −1) + (e +1) 2 e /2 (e −1) 2 , for > 0.61, Thus, it can be verified the variance of HM is always smaller than other mechanisms in Table 9.

Mean Value Estimation on Evolving Data
As pointed in Section 3.7, the privacy leakage will accumulate with the increase of time. This also exists in the mean estimation. Ding et al. [28] employed both α-point rounding and memoization techniques to estimate mean value of the counter data while ensuring strong privacy protection over time. The basic idea of α-point rounding is to discretize the data domain based on a discretization granularity s. Ding et al. proposed 1-bit mechanism 1BitMean for mean estimation. Assume that each user i has a private value v i (t) ∈ [0, r] at time t. The 1BitMean requires that each user reports one bit b i (t) that is drawn from the distribution b i (t) = 1, with probability 1 e +1 + v i (t) r · e −1 e +1 , 0, otherwise.
The mean value of N users at time t can be estimated as Based on 1BitMean, the procedure of α-point rounding includes the following four steps. (i) Discretizes data domain r into s parts. (ii) Each user i randomly picks a value α i ∈ {0, 1, · · · , s − 1}. (iii) Each user computes and memoize 1-bit response by invoking 1BitMean. (iv) Each user performs α-rounding based on an arithmetic progression that rounds value to the left is v i + α i < R, otherwise rounds value to the right. Ding et al. [28] have proved that the accuracy of α-point rounding mechanism is the same as 1BitMean and is independent of the choice of discretization granularity s.

Machine Learning with LDP
Machine learning, as an essential data analysis method, was applied to various fields. However, the training process may be vulnerable to many attacks (such as membership inference attacks [116], memorizing model attacks [117], model inversion attacks [118]). For example, adversaries may extract the memorized information in the training process to approximate the sensitive data of the users [117]. Even worse is that Fredrikson et al. [118] have shown an example that the adversary could recover images from a facial recognition system under model inversion attacks, which shows the weakness of a trained machine learning model.
The machine learning algorithms with global DP were extensively studied by imposing private training [119][120][121][122][123]. With the introduction of LDP, the machine learning algorithms with LDP were also investigated to achieve privacy protection in a distributed way. The following subsections summarize the existing machine learning algorithms with LDP from the perspective of supervised learning, unsupervised learning, empirical risk minimization, deep learning, reinforcement learning, and federated learning.

Supervised Learning
Supervised learning algorithms focus on training a prediction model describing data classes via a set of labeled datasets.
Yilmaz et al. [124] proposed to train a Naïve Bayes classifier with LDP. Naïve Bayes classification is to find the most probable label when given a new instance. In order to compute the conditional distributions, we need to keep the relationships between the feature values and class labels when perturbing input data. To keep this relationship, Yilmaz et al. transformed each user's value and label into a new value first and then performed LDP perturbation. Xue et al. [109] also aimed at training a Naïve Bayes classifier with LDP. They proposed to leverage the joint distributions to compute the conditional distributions. Besides, Berrett and Butucea [125] further considered the binary classification problem with LDP.
High-dimensionality is a big challenge for training a classifier with LDP, which will result in huge time cost and low accuracy. One of the traditional solutions is dimensionality reduction, such as Principal Component Analysis (PCA) [126]. However, the effective dimensionality reduction methods with LDP in machine learning still need further research. Moreover, user partition is always used when learning a model with LDP. For example, the work in [124] partitions users into two groups to compute the mean value and squares, respectively. However, simply partitioning users into groups will reduce the estimation accuracy. Therefore, research on supervised learning with LDP still has a long way to go.

Unsupervised Learning
The problem of clustering was studied under centralized DP [127][128][129]. With LDP model, Nissim and Stemmer [130] conducted 1-clustering by finding a minimum enclosing ball. Moreover, Sun et al. [131] have investigated the non-interactive clustering under LDP. They extended the Bit Vector mechanism in [132,133] by modifying the encoding process and proposed kCluster algorithm in an anonymous space based on the improved encoding process. Furthermore, Li et al. [134] proposed a local-clustering-based collaborative filtering mechanism that uses the kNN algorithm to group items and ensure the item-level privacy specification.
For clustering in the local model, the respondent randomizes her/his own data and reports to an untrusted data curator. Although the accuracy of local clustering is not good as that in the central model, local clustering algorithms can achieve stronger privacy protection for users and are more practical for privacy specifications, such as personalized privacy parameters [57,135]. Xia et al. [136] applied LDP on K-means clustering by directly perturbing the data of each user. They proposed a budget allocation scheme to reduce the scale of noise to improve accuracy. However, the investigation on clustering under LDP is still in the early stage of research.

Empirical Risk Minimization
In machine learning, the error computed from training data is called empirical risk. Empirical risk minimization (ERM) is such a process that computes an optimal model from a set of parameters by minimizing the expected loss [137]. A loss function L(θ; x, y) is parameterized by x, y and aims to map the parameter vector θ into a real number. The goal of ERM is to identify a parameter vector θ * such that where λ > 0 is the regularization parameter. By choosing different loss functions, ERM can be used to solve certain learning tasks, such as logistic regression, linear regression, and support vector machine (SVM). The interactive model and non-interactive model under LDP were discussed in the existing literature for natural learning problems [32,49]. Apparently, the interactive model has a better accuracy but leads to high network delay and week privacy guarantee. The non-interactive model is strictly stronger and more practical in most settings.
Smith et al. [138] initiated the investigation of interaction in LDP for natural learning problems. They pointed out that for a large class of convex optimization problems with LDP, the server needs to exchange information with each user back and forth in sequence, which will lead to network delays. Thus, they investigated the necessity of the interactivity to optimize convex functions. Smith et al. also provided new algorithms that are either non-interactive or only use a few rounds of interaction. Moreover, Zheng et al. [139] further proposed more efficient algorithms based on Chebyshev expansion under non-interactive LDP, which achieves quasi-polynomial sample complexity bound.
However, the sample complexity in [138,139] is exponential with the dimensionality and will become not very meaningful in high dimensions. In fact, it is quite common to involve high dimensions in machine learning. Wang et al. [32] have proposed LDP algorithms with its error bound depending on the Gaussian width, which improves the one in [138], but the sample complexity still is exponential with the dimensionality. Their following work in [140] improves the sample complexity to be quasi-polynomial. However, the practical performance of these algorithms is still limited. Therefore, for the generalized linear model (GLM), Wang et al. [141] further proved that when the feature vector of GLM is sub-Gaussian with bounded 1 -norm, then the LDP algorithm for GLM will achieve a fully polynomial sample complexity. Furthermore, Wang and Xu [142] addressed the principal component analysis (PCA) problem under the non-interactive LDP and proved the lower bound of the minimax risk in both the low and high dimensional settings.
Moreover, the works in [29,91] built the classical machine learning models under LDP in a way of empirical risk minimization (ERM). They solved ERM by stochastic gradient descent (SGD). They consider three common machine learning models: linear regression, logistic regression and SVM classification. The SGD algorithm is adopted to compute the target parameter θ that holds the minimum (or hopefully) loss. Specifically, at each iteration t + 1, the parameter vector is updated as θ t+1 = θ t − η · ∇L(θ t ; x, y), where η is the learning rate, x, y is a tuple of randomly selected user, ∇L(θ t ; x, y) is the gradient of loss function L(θ t ) at θ t . With LDP, each ∇L is perturbed into a noisy gradient ∇L * by an LDP-compliant algorithm before reporting to the aggregator. That is, where ∇L * i is the perturbed gradient of the user u i and |G| is the batch size. When considering the data center network (DCN), Fan et al. [143] investigated the LDP-based support vector regression (SVR) classification for cloud computing supported data centers. Their method achieves LDP based on the Laplace mechanism. Similarly but differently, Yin et al. [144] studied the LDP-based logistic regression classification by involving three specific steps, i.e., noise addition, feature selection, and logistic regression. LDP is also applied to online convex optimization to avoid disclosing any parameters while realizing unconstrained adaptive online learning [145]. Besides, Jun and Orabona [146] studied the parameter-free SGD problem under LDP. They proposed BANCO that achieves the convergence rate of the tuned SGD without repeated runs, thus reducing privacy loss and saving the privacy budget.

Deep Learning
Deep learning has played an important role in natural language processing, image classification, and so on. However, the adversaries can easily inject the malicious algorithms in the training process, and then extract and approximate the sensitive information of users [117,147].
Arachchige et al. [34] proposed an LDP-compliant mechanism LATENT to control privacy leaks in deep learning models (i.e., convolutional neural network (CNN)). Just like other LDP frameworks, LATENT integrates a randomization layer (i.e., LDP layer) to against the untrusted learning servers. A big challenge of applying LDP in deep learning is that the sensitivity is extremely large. In LATENT, the sensitivity is lr, where l is the length of the binary string and r is the number of layers of the neural network. To address this, Arachchige et al. further improved OUE [35] where the sensitivity is 2. They proposed the modified OUE (MOUE) that has more flexibility for controlling the randomization of 1s and increasing the probability of keeping 0 bits in their original state. Besides, Arachchige et al. proposed utility enhancing randomization (UER) mechanism that further improves the utility of the randomized binary strings.
Furthermore, by using the teacher-student paradigm, Zhao in [148] investigated the distributed deep learning model under DP and further considered to allow the personalized choice of privacy parameters for each distributed data entity under LDP. Xu et al. [149] have applied LDP on a deep inference-based edge computing framework to privately build complex deep learning models. Overall, deep learning with LDP is in the early stage of research. Further research is still needed to provide strong privacy, reduce high dimensionality, and improve accuracy.

Reinforcement Learning
Reinforcement learning enables an agent to learn a model interactively, which was widely adopted in artificial intelligence (AI). However, reinforcement learning is vulnerable to potential attacks, leading to serious privacy leakages [150].
To protect user privacy, Gajane et al. [151] initially studied the multi-armed bandits (MAB) problem with LDP. They proposed a bandit algorithm with LDP aiming at arms with Bernoulli rewards. Afterward, Basu et al. [152] proposed a unifying set of fundamental privacy definitions for MAB algorithms with the graphical model and LDP model. They have provided both the distribution-dependent and distribution-free regret lower bounds.
As for distributed reinforcement learning, Ono and Takahashi [153] proposed a framework Private Gradient Collection (PGC) to privately learn a model based on the noisy gradients. Under the PGC, each local agent reports the perturbed gradients that satisfy LDP to the central aggregator who will update the global parameters. Besides, Ren et al. [154] mainly investigated on the regret minimization for MAB problems with LDP and proved a tight regret lower bound. They proposed two algorithms that achieve LDP based on Laplace perturbation and Bernoulli response, respectively.
Reinforcement learning plays an important role in AI [155]. LDP is a potential technique to prevent sensitive information from leakage in reinforcement learning. However, LDP-based reinforcement learning is still in its infancy.

Federated Learning
Federated learning (FL) is one of the core technologies for the development of a new generation of artificial intelligence (AI) [156][157][158]. It provides attractive collaborative learning frameworks for multiple data owners/parties [159]. Although FL itself can effectively balance the trade-off between utility and privacy for machine learning [160], serious privacy issues still occurred when transmitting or exchanging model parameters. Therefore, LDP was widely adopted in FL systems to provide strong privacy guarantees, such as in smart electric power systems [161] or wireless channels [162].
The studies in [163,164] have adopted global DP [22] to protect sensitive information in FL. However, since FL itself is a distributed learning framework, LDP is more appropriate to FL systems. Truex et al. [165] integrated LDP into a FL system for joint training of deep neural networks. Their method can efficiently handle complex models and against inference attacks while achieving personalized LDP. Besides, Wang et al. [33] proposed FedLDA that is an LDP-based latent Dirichlet allocation (LDA) model in the setting of FL. FedLDA adopts a novel random response with prior, which ensures that the privacy budget is irrelevant to the dictionary size and the accuracy is greatly improved by an adaptive and non-uniform sampling processing.
To improve the model-fitting and prediction of the schemes, Bhowmick et al. [166] proposed a relaxed optimal LDP mechanism for private FL. Li et al. [167] introduced an efficient LDP algorithm for meta-learning, which can be applied to realize the personalized FL. As for federated SGD, LDP was adopted to prevent privacy leakages from gradients. However, the increase of dimension d will cause the privacy budget to decay rapidly and the noise scale to increase, which leads to poor accuracy of the learned model when d is large. Thus, Liu et al. [168] proposed FedSel that only selects the most important top-k dimensions under the premise of stabilizing the learning process. In addition, Sun et al. [169] proposed to mitigate the privacy degradation by splitting and shuffling, which reduces noise variance and improve accuracy.
Recently, Naseri et al. [170] proposed an analytical framework that empirically assesses the feasibility and effectiveness of LDP and CDP in protecting FL. They have shown that both LDP and global DP can defend against backdoor attacks, but not do well for property inference attacks.

Applications
This Section concludes the wide applications of LDP for real practice and the Internet of Things.

LDP in Real Practice
LDP was applied to many real systems due to its powerfulness in privacy protection. There are several large scale deployments in the industry as follows.
• RAPPOR in Google Chrome. As the first practical deployment of LDP, RAPPOR [27] is proposed by Google in 2014 and was integrated into Google Chrome to constantly collect the statistics of Chrome usage (e.g., the homepage and search engine settings) while protecting users' privacy. By analyzing the distribution of these settings, the malicious software tampered with the settings without user consent will be targeted. Furthermore, a follow-up work [87] by the Google team has extended RAPPOR to collect more complex statistical tasks when there is no prior knowledge about dictionary knowledge. • LDP in iOS and MacOS. The Apple company has deployed LDP in iOS and MacOS to collect typing statistics (e.g., emoji frequency detection) while providing privacy guarantees for users [26,85,171]. The deployed algorithm uses the Fourier transformation and sketching technique to realize a good trade-off between massive learning and high utility. • LDP in Microsoft Systems. Microsoft company has also adopted LDP and deployed LDP starting with Windows Insiders in Windows 10 Fall Creators Update [28]. This deployment is used to collect telemetry data statistics (e.g., histogram and mean estimations) across millions of devices over time. Both α-point rounding and memoization technique are used to solve the problem that privacy leakage accumulates as time continues to grow. • LDP in SAP HANA. SAP announced that its database management system SAP HANA has integrates LDP to provide privacy-enhanced processing capabilities [172] since May 2018. There are two reasons for choosing LDP. One is to provide privacy guarantees when counting sum average on the database. The other is to ensure a maximum transparency of the privacy-enhancing methods since LDP can avoid the trouble and overhead of maintaining a privacy budget.
Many other companies (e.g., Firefox [84] and Samsung [91]) also plan to build LDP-compliant systems to collect the usage statistics of users while providing strong privacy guarantees. It still needs great efforts to research on large-scale, efficient, and accurate privacy-preserving frameworks while monitoring the behaviors of client devices.

LDP in Various Fields
With the rapid development of the Internet of Things (IoT), multiple IoT applications generate big multimedia data that relate to user health, traffic, city surveillance, locations, etc. [173]. These data are collected, aggregated, and analyzed to facilitate the IoT infrastructures. However, privacy leakages of users hindered the development of IoT systems. Therefore, LDP plays an important role in data privacy protection in the Internet of Things.
Usman et al. [174] proposed a privacy-preserving framework PAAL by combining authentication and aggregation with LDP. PAAL can provide each end-device user with strong privacy guarantee by perturbing the aggregated sensitive information. Ou et al. [175] also adopted LDP to prevent the adversary from inferring the time-series data classification of household appliances. As a promising branch of IoT, the Internet of Vehicles (IoV) has stimulated the development of vehicular crowdsourcing applications, which also results in unexpected privacy threats on vehicle users. Zhao et al. [176] adopted both LDP and FL models to avoid sensitive information leakage in IoV applications. Besides, the work in [39] has provided a detailed summary of the applications of LDP in the Internet of connected vehicles.
In what follows, we summarize more specific applications of LDP in the Internet of Things.

Edge Computing
LDP itself is a distributed privacy model that can be easily used for providing strict privacy guarantees in edge computing applications. Xu et al. [149] proposed a lightweight edge computing framework based on deep inference while achieving LDP for mobile data analysis. Moreover, Song et al. [177] also leveraged LDP models to protect the privacy of multi-attribute data based on edge computing. They solved the problem of maximizing data utility under privacy budget constrains, which improves the accuracy greatly.
Duchi et al. [115] initiated to define the canonical hypothesis testing problem and studied the error bound of the probability of error in the hypothesis testing problem. From the perspective of information theory, Kairouz et al. [63,182] studied the maximizing of f -divergence utility functions under the constraints of LDP and made the effective sample size reduce from N to 2 N for hypothesis testing.
Both studies in [183,184] investigated hypothesis testing with LDP and presented the asymptotic power and the sample complexity. Sheffet in [183] showed a characterization for hypothesis testing with LDP and proved the bound of sample complexities for both identity-testing and independence testing when using randomized response techniques. Gaboardi and Rogers [184] focused on the goodness of fit and independence hypothesis tests under LDP. They designed three different goodness of git tests that use different protocols to achieve LDP and guarantee the convergence to a chi-square distribution. Afterward, Gaboardi et al. [185] further provided upper-and lower-bounds for mean estimation under ( , δ)-LDP and showed the performance of LDP-compliant Z-test algorithm. Moreover, Acharya et al. in [186] presented the optimal locally private distribution testing algorithm with optimal sample complexity, which improves on the sample complexity bounds in [183].

Social Network
In the Internet of Things, social network analysis has drawn much attention from many parties. Providing users with effective privacy guarantees is a key prerequisite for analyzing social network data. Therefore, privacy-preserving on social network publishing was widely investigated with LDP. Usually, a social network is formalized as a graph.
A big challenge is how to apply LDP on the aggregation and generation of the complex graph structures. Qin et al. [187] initially formulated the problem of generating synthetic decentralized social graphs with LDP. In order to obtain graph information from simple statistics, they proposed LDPGen that incrementally identifies clusters of connected nodes under LDP to capture the structure of the original graph. Zhang et al. [188] adopted the idea of multi-party computation clustering to generate a graph model under the optimized RR algorithm. Besides, Liu et al. [189] used the perturbed local communities to generate a synthetic network that maintains the original structural information.
The studies in [190,191] focus on online social networks (OSN) publishing with LDP. The key idea of [190,191] is graph segmentation, which can reduce the noise scale and the output space. Specifically, they split the original graph to generating the representative hierarchical random graphs (HRG) and then perturbed each local graph with LDP. Yang et al. [192] also applied a hierarchical random graph model for publishing hierarchical social networks with LDP. They further employed the Monte Carlo Markov chain to reduce the possible output space and improve the efficiency while extracting HRG with LDP. In addition, Ye et al. [193] focused on building an LDP-enabled graph metric estimation framework that supports a variety of graph analysis tasks, such as graph clustering, synthetic graph generation, community detection, etc. They proposed to compute the most popular graph metrics only by three parameters, i.e., adjacency bit vector, adjacency matrix, and node degree.
Furthermore, social networks are decentralized in nature. In addition to containing her own connections, a participant's local view also contains the connections of her neighbors that are private for the neighbors, but not directly private for the participant herself. In this case, Sun et al. [194] indicated that the general LDP is insufficient to protect the privacy of all participants. Therefore, they formulated a stringent definition of decentralized differential privacy (DDP) that ensures the privacy of both participant and her neighbors. Besides, Wei et al. [195] further proposed AsgLDP for generating decentralized attributed graphs with LDP. Specifically, AsgLDP is composed of two phases that are used for unbiased information aggregation and attributed graph generation, respectively, which preserves the important properties of social graphs.
As for social network analysis, it is hard to obtain a global view of networks since the local view of each user is much limited. The most existing methods solve this challenge by partitioning users into disjoint groups (e.g., [187,188]). However, the performance of this approach is severely restricted by the number of users. Besides, the computational cost should also be considered for large scale graphs.

Recommendation System
The recommendation system makes most of the services and applications in the Internet of Things feasible [196]. However, the recommendation system may abuse user data and extract private information when collecting the rating pairs from each user [197].
While integrating the privacy-preserving techniques into the recommendation system, a big challenge is how to compromise privacy and usability. Liu et al. [198] proposed an unobtrusive recommendation system that balances between privacy and usability by crowdsourcing user privacy settings and generating corresponding recommendations. To provide stronger privacy guarantees, Shin et al. [199] proposed LDP-based matrix factorization algorithms that protect both user's items and ratings. Meanwhile, they used the dimensionality reduction technique to reduce domain space, which improves the data utility. Jiang et al. [200] proposed a more reliable Secure Distributed Collaborative Filtering (SDCF) framework that ables to preserve the privacy of data items, recommendation model, and the existence of the ratings at the same time. Nonetheless, SDCF performs RAPPOR in each iteration to achieve LDP protection, which will lead to a large perturbation error and low accuracy. To ensure the accuracy, Guo et al. [201] proposed to reconstruct the collaborative filtering based on similarity scores, which greatly improves the trade-off between privacy and utility.
Although many studies investigated the recommendation system with LDP, the low accuracy cased by high dimensionality and high sparse rating dataset still remains a huge challenge [202]. Besides, the targeted LDP protocols for recommendation systems need further study.

Discussions and Future Directions
This section presents some discussions and research directions for LDP.

Strengthen Theoretical Underpinnings
There are several theoretical limitations under LDP needing to be further strengthened. (1) The lower bound of the accuracy is not very clear. Duchi et al. [115] showed a provably optimal estimation procedure under LDP. However, the lower bounds on the accuracy of other LDP protocols should be proved elaborately. (2) Can the sample complexity under LDP be further reduced? One of the limitations of LDP algorithms is that the number of users should be substantially large to ensure the data utility.
A general rule of thumb [27] is ∏ d i=1 |Ω i | ∝ √ N/10, where |Ω i | is the domain size of the ith attribute and N is the data size. Some studies in [56,67] have focused on the scenarios with a small number of users and tried to reduce the sample complexity for all privacy regimes. (3) There is relatively little research on relaxation of LDP (i.e., ( , δ)-LDP). It needs to be theoretically studied to show whether we can get any improvements in utility or other factors when adding a relaxation to LDP [81]. (4) The more general variant definitions of LDP can be further studied. Some novel and more strict definitions based on LDP were proposed. However, they are only for some specific datasets. For example, d-privacy is only for location datasets, and decentralized differential privacy (DDP) is only for graph datasets.

Overcome the Challenge of Knowing Data Domain.
As shown in Section 3, most classical LDP-based frequency estimation mechanisms need to know the domain of attributes in advance, such as PrivSet [76], LDPMiner [30]. However, it is unreasonable and hard to make an assumption about the attribute domain. For example, when estimating the statistics of input words of users, the domain of words is very huge and the new word may occur over time. Both studies in [87,88] try to address this problem. However, the error bounds still depend on the size of character domain and the node number of a trie, which limits the data utility when data domain is large. Thus, how to overcome the challenge of knowing data domain remains challenging.

Focus on Data Correlations
One of the limitations of existing methods is the neglect of data correlations. Such data correlations appear in multiple attributes [105], repeatedly collected data [27], or evolving data [28]. These correlations will leak some additional information about users. However, many LDP mechanisms neglect such inadvertent correlations, leading to a degradation of privacy guarantees. Some studies [90,93] focused on the correlations of key-value data. The research in [28,113] focused on the temporal correlations of evolving data. However, it still remains open problems of how to learn the data correlations privately and how to integrate such correlations into the encoding principles of LDP protocols.

Address High-Dimensional Data Analysis
The privacy-preserving data analysis on high-dimensional data always suffers from high computation/communication costs and low data utility, which is manifested in two aspects when using LDP. The first is computing joint probability distributions [72,105] (or, k-way marginals [31]). In this case, the total domain size increases exponentially, which leads to huge computing costs and low data utility due to the "curse of dimensionality". The second is protecting high-dimensional parameters of learning models in machine learning, deep learning, or federated learning tasks [168]. In this case, the scale of injected noise is proportional to the dimension, which will inject heavier noise and result in inaccurate models. Therefore, addressing high-dimensional data analysis is an urgent concern in the future.

Adopt Personalized/Granular Privacy Constraints
Since different users or parties may have distinct privacy requirements, it is more appropriate to design personalized or granular LDP algorithms to protect the data with distinct sensitivity levels. LDP itself is a distributed privacy notion. It can easily achieve personalized/granular privacy protection. Some existing work [57,203] aimed to propose personalized LDP-based frameworks for private histogram estimation. Gu et al. [59] presented Input-Discriminative LDP (ID-LDP) that is a fine-grained privacy notion and reflects the distinct privacy requirements of different inputs. However, adopting personalized/granular privacy constraints still raises further concerns when considering complex system architectures and ensuring good data utility.

Develop Prototypical Systems
LDP was widely adopted to deal with many analytic tasks and implemented in many real applications, such as Google Chrome [27], federated learning systems [165]. However, the prototypical systems based on LDP hardly appeared at present. By developing prototypical systems, we can further improve the LDP algorithms based on user requirements and the running results in an interactive way.

Conclusions
Data statistics and analysis have greatly facilitated the progress and development of the information society in the Internet of Things. As a strong privacy model, LDP was widely adopted to protect users against information leaks while collecting and analyzing users' data. This paper presents a comprehensive review of LDP, including privacy models, data statistics and analysis tasks, enabling mechanisms, and applications. We systematically categorize the data statistics and analysis tasks into three aspects: frequency estimation, mean estimation, and machine learning. For each category, we summarize and compare the state-of-the-art LDP-based mechanisms from different perspectives. Meanwhile, several applications in real systems and the Internet of Things are presented to demonstrate how LDP to be implemented in real-world scenarios. At last, we explore and conclude some future research directions from several perspectives.

Conflicts of Interest:
The authors declare no conflict of interest.