Analysis of Application Examples of Differential Privacy in Deep Learning

Artificial Intelligence has been widely applied today, and the subsequent privacy leakage problems have also been paid attention to. Attacks such as model inference attacks on deep neural networks can easily extract user information from neural networks. Therefore, it is necessary to protect privacy in deep learning. Differential privacy, as a popular topic in privacy-preserving in recent years, which provides rigorous privacy guarantee, can also be used to preserve privacy in deep learning. Although many articles have proposed different methods to combine differential privacy and deep learning, there are no comprehensive papers to analyze and compare the differences and connections between these technologies. For this purpose, this paper is proposed to compare different differential private methods in deep learning. We comparatively analyze and classify several deep learning models under differential privacy. Meanwhile, we also pay attention to the application of differential privacy in Generative Adversarial Networks (GANs), comparing and analyzing these models. Finally, we summarize the application of differential privacy in deep neural networks.


Introduction
In recent years, deep learning based on neural networks has been widely developed and successfully applied to many fields, such as image classification [1], natural language processing [2], face recognition [3,4], interpretable mechanism learning [5], and recommendation systems [6,7]. Deep neural networks can be trained to learn through a large number of training data. However, researches on model inference attacks [8] and model inversion attacks [9] make it easier to extract user information from the training dataset. ese training samples may contain sensitive information, such as medical records, property information, biological information, and social relationships. Once leaked, it will have more or less impact on users [10]. In the era of big data, users generate numerous data every day. Once user information is collected, users often cannot control how their information is used or shared. is requires application vendors to provide policies and techniques to protect user privacy. ere are many methods to protect user information in privacy-preserving fields, such as k-anonymity [11], homomorphic encryption [12], L-diversity [13], and secure multiparty computing [14]. Most of these methods desensitize the data or encrypt it into ciphertext [15], but they are not effective for some particular attacks. For example, when performing the same query f � "how many boys are in the dataset" on the top 99 records and 100 records of a dataset, you can know the sex of the 100th person by comparing the two results. is is the so-called differential attack. Differential privacy is designed to defend against this attack in the first place, which provides a rigorous privacy guarantee [16]. It protects privacy by adding noise to the dataset or the results of a query function so that the query result does not increase or decrease due to the increase or decrease of a particular record [17]. e proposal of differential privacy has made a breakthrough in privacy-preserving. Differential privacy can ensure that an attacker cannot obtain private information from arbitrary records. Differential privacy has been widely used in machine learning [18][19][20][21]. Differential privacy in deep learning is applied mostly by adding noise during the stochastic gradient descent process, as in literature [22][23][24][25]. Combining differential privacy with deep learning provides new ideas for privacy-preserving in deep neural networks [26]. Although many papers have proposed different methods to protect privacy in deep learning, there is no comprehensive paper to analyze and compare different technologies. For this purpose, this paper is proposed to compare different models of differential privacy and deep learning. After comparing and analyzing them, we classify these methods to help beginners quickly understand the knowledge in this field. We comparatively analyze and classify several deep learning models to help beginners understand the knowledge in this field. e rest of the paper is organized as follows. is paper first introduces differential privacy and deep neural networks, as well as the types of attacks that neural networks may suffer from. We classify these models into three categories and provide a detailed introduction and comparative analysis in Section 5. At the same time, we found that differential privacy can also be applied to Generative Adversarial Networks (GAN). Two typical methods are introduced and compared in Section 6. Finally, the conclusion and discussion are made in Section 7.

Preliminaries
2.1. Differential Privacy. Differential privacy is a concept proposed by C. Dwork [27] in 2006 to protect statistical databases from differential attacks. For example, for a simple query f � "how many boys are in the database," using this query to query in the first 99 rows of data and the first 100 rows of data, the gender of the 100th person can be inferred, which leads to the leakage of user privacy. Differential privacy can guarantee that the output results will not increase or decrease due to the increase or decrease of individual information in the database. Definition 1. K is a random algorithm and S is a set of all possible outputs. For any two datasets, D and D ′ differ from at most one different data: en, algorithm K provides (ε) − differentialprivacy. e parameter ε > 0, and we usually think of ε as 0.01, 0.1, ln 2, and ln 3.
In other words, if the algorithm works on any adjacent dataset, the probability of getting a specific output should be similar, and then we say that this algorithm can achieve the effect of differential privacy. is means that observers can hardly detect a small change in the dataset by observing the output results.
is method can achieve the purpose of protecting privacy to a certain extent.
In Definition 1, ε is the privacy budget [28], which represents the privacy protection level provided by the algorithm K. e smaller ε, the higher the privacy protection level. ere are two commonly used privacy budget composition theorems: sequential composition [29] and parallel composition [30].
Theorem 1 (sequential composition). Suppose that there is a set of privacy mechanisms M � M 1 , . . . , M m performing on a dataset in sequence, and each M i provides ε i − differential privacy, and then the privacy mechanism M will provide (m · ε i ) − differential privacy. Theorem 2 (parallel composition). Suppose that there is a set of privacy mechanisms M � M 1 , . . . , M m . If each M i provides ε i −differential privacy on a disjointed subset of the entire dataset, then the privacy mechanism M will provide (max ε i , . . . , ε m ) − differential privacy. Definition 2. K is a random algorithm and S is the set of all possible outputs of K. For any two datasets, D and D ′ differ from at most one different data: (2) en algorithm K provides (ε, δ) − differentialprivacy. When δ � 0, algorithm K provides ε-differential privacy.

Definition 3.
For a query function f: D ⟶ R d , R d is a d-dimensional vector and adjacent datasets D and D ′ , and the sensitivity of f is defined as Sensitivity is a parameter that determines how much noise is required for a particular query in the mechanism. It is only related to the type of query and considers the maximum difference between query results on adjacent datasets. Differential privacy has different implementation mechanisms for different algorithms. e two most commonly used are the Laplace mechanism and the exponential mechanism [31]. Laplace mechanism is often used for the protection of numerical results, while the exponential mechanism is suitable for nonnumeric results.

Laplace Mechanism.
e dense function of Laplace noise is as follows: with variance of 2b 2 .

Definition 4.
For the function f: D ⟶ R performing on the dataset D, the Laplace mechanism M is defined as follows, which provides ε-differential privacy: For a simple query f � "how many data in dataset satisfies the property P," according to the above definition, the sensitivity Δf � 1. In the Laplace mechanism, let b � 1/ε , then the density at z is proportional to e − ε|z| , and this distribution gets the maximum density at 0. For any zz ′ with |z − z ′ | ≤ 1, the maximum density of z is e ε times than z ′ , which satisfies the definition of differential privacy [32].

Exponential Mechanism.
Let the output domain of the query function be R and each value r∈R in the domain be an entity object. Under the exponential mechanism, the function q(D, r)⟶R becomes the availability function of the output value r, which is used to evaluate the output value the pros and cons of r.
Definition 5. For random algorithm M with input dataset D, output r∈R(M), q(D, r) is the availability function in exponential mechanism, and Δq is the sensitivity of the function q (D, r), if M satisfies at is, if M selects and outputs r from R with a probability proportional to exp(εq(D, r)/2Δq), then M gives ε − differential privacy.

Deep Neural Network.
Deep neural network (DNN) is very effective for many machine learning tasks. Deep neural networks are neural networks with many hidden layers. e neural network layer inside DNN can be divided into input layer, hidden layer, and output layer. Generally, the first layer is the input layer, the last layer is the output layer, and the number of layers in the middle is all hidden layers [33]. e training process of deep neural networks is divided into forward propagation and backpropagation. Forward propagation is to use weight matrix W and the bias vector b to perform a series of linear operations and activation operations with the input value vector x. Starting from the input layer, the output of the previous layer is used to calculate the output of the next layer. e layer-by-layer backward calculation is performed until the operation reaches the output layer. e activation function makes the linear relationship between the input and the output nonlinear, which makes the DNN can approximate almost any function, making the DNN more powerful. Sigmoid function, ReLU function, tanh function, and so on are commonly used as activation function.
e loss function is defined in DNN to represent the error between the output result and the actual result, and the performance of the model can be roughly judged by the loss function. Large value of the loss function indicates that the model has insufficient fitting ability, and the coefficients need to be adjusted through backpropagation. DNN automatically adjusts the coefficients through the backpropagation algorithm. e backpropagation process compares the output result with the actual result, calculates the error between them, and propagates the error from the output layer to the front until the input layer. In the process of backpropagation, the value of the weight parameter is adjusted according to the error, so that the total loss function is reduced.
Deep neural network is divided into two phases, the training phase and the testing phase. e training phase is the above-mentioned process of forward propagation and backpropagation. After the model is trained, the model needs to be used on the test dataset to see how the model performs on the test dataset.

Privacy Leakage and Attacks in Deep Learning.
Privacy protection in machine learning can be analyzed from confidentiality, integrity, and availability [34]. Attacks on confidentiality may expose neural network model structures, parameters, or data used to train and test models; attacks on integrity may affect the privacy of data sources; attacks on availability attempt to prevent legitimate users from accessing meaningful model output or function of the system itself. e training process of neural network requires a large amount of data, which may contain the user's personal privacy information, such as medical records, property information. Paper [35] indicates that user information can be extracted effectively from neural network. Both the training and testing phases of deep neural networks may have a privacy leakage problem. During the fine-tuning of the coefficient matrix in the training phase, the training dataset may be manipulated by the attacker. is situation is called a poison attack [36]. Poisoning attacks alter the training dataset by inserting, editing, or deleting sample points, with the goal of modifying the decision boundary of the target model.
During the testing phase of the neural network, the coefficients of the model have been determined, and attackers can also attack the model. ere are three main types of attacks, model extraction attacks, model inversion attacks, and member inference attacks. Model extraction attacks are to infer the specific parameters or structure of the model by the attacker through the model test results. Assuming that the model has n parameters, the attacker can test the model with m (m > n) samples, list the test results and input samples into m linear equations, and solve the equations to get n parameter values [35].
Model inversion attacks mean that the attacker can extract information related to the training data from the model test results, such as the sensitive characteristics of the training data [9]. In face recognition, an attacker can randomly construct a picture, target a certain sample (such as tom) in the training dataset, and use the gradient descent method to randomly modify the prediction result to obtain a picture with tom's face features.
Member inference attacks aim to infer whether a particular record is in the training dataset [8]. ere are blackbox attacks and white-box attacks [22].

Computational Intelligence and Neuroscience
Black-box attacks: the attacker can only get the output of the model with arbitrary inputs. For any input x, the attacker can only obtain f (x; W) but cannot get the parameter matric W of the model and the intermediate steps of the calculation. An attack against a black-box model has been defined in [8] which exploits the statistical difference between the model's predictions on the training set and unseen data. White-box attacks: for any input x, besides the output, the attacker can also obtain the structure and parameters of the model and observe the intermediate calculation steps in the hidden layer.

Differential Privacy in Deep
Learning. Due to the privacy leakage in the neural network, it is necessary to protect these models. e proposal of differential privacy makes a new way to protect privacy [37]. In this section, we analyze six differential privacy models: DP-SGD, the improved DP-SGD, Adaptive Laplace Mechanism, dPA, PCDBN, and PATE. We divide these models into three categories. e first category is to increase noise during the stochastic gradient descent of neural networks to achieve differential privacy. DP-SGD improved DP-SGD, and the Adaptive Laplace Mechanism belongs to the first category. e second category is based on functional mechanisms, which achieve differential privacy by perturbing the objective function of the optimization problem instead of its result. dPA and PCDBN belong to the second category. e third category is a new framework that protects privacy through knowledge aggregation and transmission [38]. PATE model belongs to the third category.  [23]; this algorithm aims to control the impact of the training dataset on the training process, especially during the calculation of gradient. Based on [24], compared to the algorithms in this paper, the DP-SGD algorithm has been modified and expanded, especially in the calculation of privacy budget. e main idea of DP-SGD is shown in Algorithm 1.
For the DP-SGD algorithm, an important issue is to track the privacy budget in training phases.
is paper proposes a privacy accounting method "Moments Accountant," which can prove that this algorithm satisfies (O(qε√T), qδ)− differential privacy. e bound of privacy cost using Moments Accountant mechanism is less than the bound using strong composition theorem [40]. We save a factor of ������� log(1/δ) in ε part and a factor of Tq in δ part.

Improved DP-SGD.
ere are two main problems with the DP-SGD algorithm. e first problem is that when implementing the algorithm, in order to obtain higher efficiency, random shuffling is often used to batch data. is method will lead to higher privacy loss and make the privacy loss calculated by Moment Account underestimated. e second problem is that the DP-SGD algorithm needs to iterate multiple times when calculating the privacy loss. To solve these problems, Lei Yu et al. [42] proposed a method for training neural networks by using concentrated differential privacy (CDP) [43]. CDP makes the privacy protection algorithm more practical than traditional DP when doing a lot of calculations, while still providing a strong privacy guarantee.
is paper first proposes a dynamic privacy budget allocation technique and then develops zCDP-based privacy accounting methods for different data batch processing methods.

Dynamic Privacy Budget Allocation.
For a given privacy budget, the accuracy of the final model depends on how the privacy budget is allocated during training. Privacy budget allocation technology aims to optimize the budget allocation in the training process, so as to obtain a differential privacy DNN model with higher accuracy. e main idea is that as the model accuracy converges, the noise on the gradient becomes less.
is will make the model closer to the optimal solution while having higher accuracy. On this basis, literature [42] proposed two privacy budget allocation techniques: adaptive schedule based on public validation dataset and predefined schedules.

Adaptive Schedule Based on Public Validation Dataset.
e technique uses the public verification dataset to monitor verification errors during training and reduce the noise scale when verification errors stop improving. Specifically, whenever the verification accuracy is improved to less than the threshold δ, the noise level is reduced by a factor of k until the privacy budget is exhausted.

Predefined Schedules.
is method does not use a verification dataset but predefines certain decay functions so that the noise level will decrease over time. is document mainly uses four decay functions to reduce the noise level: Time-Based Decay, Exponential Decay, Step Decay, and Polynomial Decay.

Privacy Accountant
Definition 6. CDP considers privacy loss on an outcome o of the randomized mechanism A operating on two adjacent databases D and D′ as follows: (µ, τ)-CDP ensures that the mean of privacy loss does not exceed µ, and the probability of the loss exceeding its mean 4 Computational Intelligence and Neuroscience by an amount of t · τ is bounded by e − t 2 /2 [43]. Bun and Steinke [44] proposed another form of (µ, τ)-CDP, called zero-concentrated differential privacy, zCDP.
Definition 7. For two datasets differs from only one data, ere are two propositions proposed in literature [44].
Since zCDP and DP are comparable, this paper proposes a privacy accounting method based on zCDP. According to the sequential composition satisfied by zCDP, if each iteration satisfies ρ − zCDP and the total number of iterations of the training process is T, then the entire training process satisfies (Tρ) − zCDP. is paper proposes the composition of privacy loss under two common batch processing methods: replacement random sampling and random shuffling. Replacing random sampling refers to randomly sampling each sample from the training dataset. Random shuffling means that the training dataset is randomly shuffled into batches of similar size, and the SGD process processes one batch at a time. Random shuffle is a common practice, its performance is better than random sampling, and the convergence speed is improved [45].
In the random shuffling method, the loss of privacy is tracked using eorem 2. In replacing random sampling method, eorem 4 is used to calculate.
where q is the probabilityL/N of randomly selected samples and P( * ) and U( * ) are functions of q and δ. Under different privacy budget allocation methods, given the data processing method, the privacy loss can be calculated according to eorem 2 or eorem 3. [46] proposed an Adaptive Laplace Mechanism (AdLM) whose main idea is to add more noise to features that are not related to the model output. According to the contribution of each feature to the model output, the Laplace noise is injected into features adaptively. Unlike the method in [47], the noise and privacy budget injected by this method are not accumulated in each training step. e consumption of the privacy budget is independent of the number of training times. e Adaptive Laplace Mechanism is shown in Algorithm 2. e computing loss function F L (θ t ) suffices ε 3 − differential privacy, according to composition theorem [48]; this algorithm gives (ε 1 + ε 2 + ε 3 )-differential privacy. Besides, this mechanism can be used in various deep learning models, such as CNN [49], deep autoencoder [33], and convolution deep belief networks [50].

Adaptive Laplace Mechanism. Paper
Input:Sample x 1 , x 2 , . . . , x n , learning rate η t , Loss function L(θ) � 1/N i L(θ, x i ) , group size L noise scale σ, gradient norm bound C, : θ T and compute privacy cost (ε, δ) with the privacy accounting method. Computational Intelligence and Neuroscience 2.11.2. Comparative Analysis. Among the three models of the first type, DP-SGD uses gradient clipping and increased Gaussian noise to implement differential privacy. At the same time, a privacy accounting mechanism "Moments Accountant" is proposed. e privacy loss threshold obtained using this mechanism is small. On this basis, Lei Yu et al. [42] improved the DP-SGD algorithm and proposed a dynamic privacy budget allocation technique. rough experiments in [42], the comparison of the privacy accounting method of DP-SGD and the improved DP-SGD algorithm is shown in Table 1. It can be seen that the privacy loss obtained by the MA method and the zCDP method is smaller than that using the strong composition, indicating that both methods can obtain the privacy loss value more accurately. zCDP(RF) has a higher privacy loss value than MA and zCDP(RS) methods because RS introduces more certainty.
However, RF is a more commonly used method in deep neural networks. e DP-SGD algorithm is also implemented using RF for batch processing. It can be seen that the MA method underestimates its privacy loss.
As the accuracy of the model converged, the noise on the gradient became less. At the same time, zCDP-based privacy accounting methods are developed for different batch processing methods. e Adaptive Laplace Mechanism adds more noise to features that are not related to the model output. According to the contribution of each feature to the model output, the Laplace noise is adaptively injected into the feature. e accuracy comparison of these three models is shown in Table 2.
It can be seen that, on the MNIST dataset, the accuracy of the three models is very high, reaching more than 90%. When ε � 0.5, the accuracy of the AdLM model is slightly higher than that of DP-SGD, reaching 93.66%. On the Compute the average relevance by applying LRP alg. [41]. [1,d] Inject Laplace noise into coefficients of the differentially private layer h0 Construct hidden layers h 1 , h 2 , . . . , h k , normalization layers η 1 , η 2 , . . . , η k Inject Laplace noise into coefficients of the approximated loss function ALGORITHM 2: Adaptive Laplacian mechanism. Computational Intelligence and Neuroscience CIFAR-10 dataset, the improved algorithm of DP-SGD has lower accuracy, and the accuracy of DP-SGD and AdLM algorithms is higher. When ε � 8, the accuracy of AdLM algorithm is more accurate than that of the DP-SGD algorithm. e rate is about 4% higher. e accuracy of the improved DP-SGD method under different privacy budget decay functions is slightly different, but similar. From the perspective of convergence speed, the DP-SGD algorithm is faster than AdLM. e improved algorithm of DP-SGD is proposed for the problem of excessive iterations and underestimation of privacy loss when calculating the privacy loss of the DP-SGD algorithm. erefore, its convergence speed is faster than DP-SGD.

e Second Category
2.12.1. dPA. Deep autoencoder (dA) is one of the basic deep learning models and is widely used in natural language processing and other fields [33]. Autoencoder is an unsupervised learning algorithm, which is mainly used for data dimensionality reduction or feature extraction. In deep learning, it can be used to determine the initial value of the weight matrix before training starts. It encodes the highdimensional input so that the compressed low-dimensional vector maintains the characteristics of the input data. Phan et al. [51] proposed a deep private autoencoder (dPA), which is implemented by perturbing the target function (not the result) of a traditional deep autoencoder to achieve differential privacy. is algorithm is improved on the basis of functional mechanism (FM) [20]. e algorithm steps are as follows.
Derive polynomial approximation of data reconstruction function RE(D, W), denoted as RE(D, W).
For a given encoding x i , the reconstruction function of the autoencoder is as follows: e above function is transformed into Taylor expansion: e function RE(D, W) is perturbed by using functional mechanism, denoted as RE(D, W).
Calculate the sensitivity of RE(D, W) and RE(D ′ , W), and perturb the function according to the sensitivity. e sensitivity is as follows: Compute W � argmin w RE(D, W) to get the initial weight matrix of the input layer.

Private Autoencoder (PA) Stacking.
Fix the initial weight matrix of the input layer to autoencoder each subsequent layer. e hidden units of the lower layer will be considered as the input of the next PA. To guarantee that this input to the next PA satisfies Computational Intelligence and Neuroscience layer is added on the top of the hidden layer to make . Derive and perturb the polynomial approximation of cross-entropy error C(θ), denoted as C(θ). e cross-entropy error function of the softmax layer is e above function is transformed into the polynomial form: Calculate the sensitivity Δ C � |h (k) | + 1/4|h (k) | 2 of C(Y T , θ) and C(Y T ′ , θ), and perturb the cross-entropy loss function according to the sensitivity.
2.13.1. pCDBN. Private deep convolutional belief network (pCDBN) [52] is essentially a differential privacy version of convolutional deep belief network (CDBN) [50]. is method is similar to dPA in [51], but there are still some differences. Since the global sensitivity of the CDBN in the functional mechanism cannot be derived, it is difficult to identify the approximate error range in the CDBN, and the Chebyshev polynomial is used in the pCDBN to approximate the nonlinear objective function. en, noise is injected into these polynomials, and the functional mechanism is used to make each hidden layer's training phase satisfy ε-differential privacy. Finally, the hidden layer becomes a private hidden layer after the above transformation, the private hidden layer is stacked on each layer, and the polynomial form of the cross-entropy error function of the softmax layer is obtained and then perturbed to generate a private convolutional deep confidence network. e algorithm steps are as follows.
Derive a polynomial approximation of the energy function E(D, W), denoted as E(D, W).
Perturb function E(D, W) by using functional mechanism [20], denoted as E(D, W). Stack the private hidden and pooling layers (H, P) to construct pCDBN. Apply the technique presented in [51], and the cross-entropy error is transformed into a polynomial form at the softmax layer of the classification and prediction tasks and then perturbed.

Experiment
Experimental comparison between pCDBN, CDBN, dPAH (dPA for human behavior prediction), TCDNB (a simplified version of CDBN, without adding noise to the energy function approximation), and conditionally restricted Boltzmann machine [53] (SctRBM) is shown in Figure 1. It can be seen that the accuracy of the model without privacy-preserving remains basically unchanged. Due to the addition of noise, dPAH and pCDBN have lower accuracy than CDBN and TCDNB models. e accuracy of pCDBN is higher than that of the dPAH model, and the accuracy of the pCDBN model is even higher than that of SctRBM without added noise.
is literature also compares the pCDBN model with the DP_SGD (pSGD) model in [23]. e experimental results are shown in Figure 2. When ε � 0.5, the pSGD model reached 88.75% accuracy at 18 epochs, and the pCDBN model pCDBN reached 91.71% accuracy after 162 epochs, which was higher than the pCDBN model accuracy.

Comparative Analysis.
In the second type of model, based on the functional mechanism, dPA uses Taylor expansion to approximate the cross-entropy error function to a polynomial form and then injects noise. pCDBN uses Chebyshev polynomials to derive polynomial approximations of the energy function and nonlinear objective functions and then injects noise. e algorithm and ideas of these two models are basically similar, but they use different methods to  transform the objective function into a polynomial form. Literature [52] compared pCDBN with dPAH, a human behavior recognition model using dPA, in the experimental part. e results can be seen in Figure 2. e result shows that the accuracy of pCDBN is higher than that of dPAH.

PATE
4.1.1. Framework. Some neural network models may inadvertently remember some privacy data, and there is a risk of leaking privacy. e PATE (Private Aggregation of Teacher Ensembles) algorithm proposed by Papernot et al. [54] can provide a strong guarantee for training data. is method combines multiple models trained using disjoint datasets (such as records from different subsets of users) in a black-box manner. ese models are not published but are used as "teachers" for "student" models. Students will learn to predict the output selected by noise voting among all teachers and have no direct access to individual teachers, basic data, or parameters. e general framework of the PATE algorithm is shown in Figure 3, which is reproduced from Nicolas Papernot et al., 2017 [54].
is algorithm divides the training dataset into n groups, and each group is trained using n models, and these n models become "teacher" models. When the prediction results of these n teacher models are combined, it is performed according to the principle that the minority obeys the majority, and noise is added to it, thereby disturbing the voting result and protecting privacy. For an input x, if most teachers' predictions are consistent, adding noise will not affect the final prediction result; if the teacher's prediction results are divided into two categories and the number of votes is the same, then one of the predictions will be output randomly after adding the noise.
As the number of predictions increases, the model needs to add more noise, which makes the model useless. And if the adversary can access the parameters of the model, the privacy guarantee cannot be held. To solve this problem, the PATE algorithm introduces a "student" model. e student model is trained using nonsensitive data and unlabeled data. Part of the unlabeled data is labeled by the teacher model and then used as the training dataset for the student model with the remaining unlabeled data. Using the student model instead of the aggregation of teacher for deployment, a fixed loss of privacy can be obtained, the value of which is determined by the number of queries made to the teacher model during student model training. erefore, even if the adversary obtains its architecture and parameters by attacking the student model, the algorithm can protect user privacy from being leaked. e PATE framework uses the Moment Account mechanism of [23] to conduct privacy analysis. At each step, add Lap(1/c) to the aggregation mechanism to implement(2c, 0) − DP, and then after T step, this mechanism is implemented (4Tc 2 + 2c �������� 2T ln 1/δ √ , δ)-DP. e accuracy of the student models on the MNIST and SVHN datasets is shown in Table 3. It can be seen that the PATE model achieves up to 98% accuracy in the MNIST dataset and more than 90% accuracy in the SVHN dataset.

Experiment.
In the third type of model, PATE uses the aggregation results of the "teacher" model to train the "student" model, so that attackers cannot directly access the "teacher" model, private data, or model parameters. PATE supports various models flexibly, especially for deep neural networks. Experiments show that the PATE model has higher accuracy on the MNIST and SVHN datasets, and 98% accuracy on the MNIST dataset (ε � 2.04, δ � 10 − 5 ) as shown in Table 3.

Comparative Analysis of These Three Categories
In the three types of methods, noise is added to achieve differential privacy. e first category of methods adds noise to the model gradient. On this basis, it discusses how to assign privacy (dynamic or static) and how to add noise (fixed or not). e second category is based on functional mechanisms and protects privacy by perturbing the objective function of the optimization problem rather than its result. e third type is a new framework by training the teacher model dispersedly, making decisions based on the prediction results of the teacher model and the noise added to it, and then introducing the student model to hold the privacy guarantee. ese three types of methods  Figure 3: Framework of PATE [54].
Computational Intelligence and Neuroscience can realize the use of differential privacy in DNN to protect user privacy data, which is representative. e comparative analysis of these three types of models is shown in Table 4.

Differential Privacy IN GAN. Generative Adversarial
Network (GAN) is a model used to estimate the distribution of the training dataset and use this distribution to randomly generate samples [55]. However, due to the high complexity of the model, it can easily remember the training samples, which leads to the leakage of user privacy data. By repeatedly sampling from the distribution, there is a considerable opportunity to recover the training samples. For example, Hitaj et al. [56] introduced an active inference attack model that can reconstruct training samples from the generated samples.
e general idea of protecting user privacy information using differential privacy in a GAN is to add noise to the discriminator during the training process and cooperate with the calculation of the generator, such as the literature ( [57][58][59]). Reference [59] proposed an AC-GAN model for clinical data sharing, and the model does not leak user privacy data. Now we will introduce some methods of using differential privacy to protect GAN.

Algorithm.
Liyang et al. [57] proposed a framework that combines the differential privacy method in [23] with GAN and DPGAN.
is model adds carefully designed noise during the training process, performs gradient clipping, and uses the Wasserstein distance [60] as an approximation of the distance between probability distributions, which is more reasonable than the JS-divergence in GAN. Specifically, when calculating the D gradient relative to the actual sample x, we first clip the gradient by injecting the designed noise (line 6) to ensure that the sensitivity is limited by e. en, we add random noise sampled from the Gaussian distribution. RMSProp is an optimization algorithm that can adaptively adjust the learning rate according to the size of the gradient [61]. e detailed algorithm is shown in Algorithm 3. e clip function in Algorithm 3 satisfies that the activation function of the discriminator has a bounded range and bounded derivatives everywhere:σ(·) ≤ B σ and σ ′ (·) ≤ B σ ′ , and every data point x satisfies ‖x‖ ≤ B x and then ||g w (x (i) , z (i) )|| ≤ c g for some constant c g .

Experiment
e accuracy of DPGAN under the MNIST dataset is shown in Figure 4. From left to right on the figure are the real data, the generated nonprivate samples, and the generated samples where ε � 11.5, 3.2, 0.96, and 0.72, respectively. As can be seen from this figure, as the noise level increases, the accuracy of the generated samples is also higher, indicating that the more efficient the samples are generated.
6.1. GANobfuscator 6.1.1. Framework. GANobfuscator is a model proposed by Chugui Xu et al. [58] which uses differential privacy to mitigate the leakage of private information in GANs. is algorithm adds well-designed noise to the learning process  Training Strategy Figure 4: Accuracy of DPGAN on digits 4 and 5 [46].
Parameter: α d , learning rate of discriminator. α g , learning rate of generator. c p , parameter clip constant. m, batch size. M, total number of training data points in each discriminator iteration. n d , number of discriminator iterations per generator iteration. n g , generator iteration. σ n , noise scale. c g , bound on the gradient of Wasserstein distance with respect to weights. Ensure: Differential private generator θ. Initial discriminator parameters w 0 , generator parameters θ 0 . for t 1 � 1, . . . , n g do for t 2 � 1, . . . , n d do Sample z (i) m i�1 ∼ p(z) a batch of prior samples. Sample x (i) m i�1 ∼ p data (x) a batch of real data points.
, another batch of prior samples.  Computational Intelligence and Neuroscience of GANs. With this algorithm, analysts can generate unlimited synthetic samples for any task without leaking information about the training samples. e general framework of this algorithm is in Figure 5.
e privacy data X reaches the discriminator D through the privacy protection layer. e role of the discriminator is to distinguish the real data from the artificial dataset X generated by the training differential privacy generator G. e implementation method of GANobfuscator is similar to the method in [23], which adds noise during the training process. Compared with the discriminator D and the generator G, G generally uses the construction module [62] and batch normalization [63] to generate samples. D has only a simple structure and a small number of parameters and D  Figure 6: Accuracy of GANobfuscator [47].
Input: α d , learning rate of discriminator. α g , learning rate of generator. c p , constant for parameter clip. m, batch size. M, total number of training data points in each discriminator iteration. T d , the number of discriminator iterations per generator iteration. T g , generator iteration. σ, noise scale. c g , bound on the gradient of Wasserstein distance with respect to weights. D pub , public data. D pri , private data. Output: Differentially private generator G.
Initialize discriminator parameters w 0 , generator parameters λ;  can get real data directly. erefore, D is easier to measure the loss of privacy. Noise only needs to be added when training the discriminator D.

Algorithm
e algorithm flow of GANobfuscator is similar to the DPGAN algorithm. e difference between the two lies in the clip function when clipping gradient (see Algorithm 3). e problem brought by Algorithm 3 is that the quality of the generated samples is low and the model convergence speed is slow. In order to solve this problem, Chugui Xu et al. [58] designed an optimized GANobfuscator algorithm. is method enhances GANobfuscator through adaptive pruning function to monitor the change of gradient and dynamically adjust the pruning range to converge faster and get stronger privacy. e optimized GANobfuscator algorithm is shown in Algorithm 4.

Experiment.
When ε � 2, δ � 10 − 5 , randomly select different numbers of samples in the generated data, establish a classifier, and then use the MNIST dataset for testing. After repeating 100 times, the experimental accuracy is obtained as shown in Figure 6. Experimental results show that the accuracy of the GANobfuscator model is higher than that of the GAN without noise, and this model greatly increases the number of samples, making the availability of the generated model high.

Comparative Analysis.
Both the GANobfuscator and the DPGAN model are based on the WGAN model and add noise to the gradient in the discriminator training process to achieve differential privacy. eir difference lies in the way in which the noise is clipped. DPGAN pruning guarantees that f w (x) w∈W are all K w − Lipschitz and limits the gradient of each data point in a way. GANobfuscator monitors the change in a gradient through adaptive pruning and dynamically adjusts the pruning range to achieve faster convergence and stronger privacy. e author of [58] conducted an experiment on the ability of the model to resist inference attacks. e experimental results are shown in Figure 7. It can be seen that, under the CelebA dataset, the GANobfuscator model has a stronger ability to resist inference attacks than the GAN, dp-GAN, and DP-GAN models.

Conclusion
In the current era of information explosion, the widespread application of deep learning makes user privacy easy to leak. e development of differential privacy technology provides new ideas for privacy protection in deep neural networks (DNNs). Using differential privacy to protect data in DNNs is usually achieved by adding noise during the stochastic gradient descent process. We compared and analyzed several examples of combining differential privacy with DNNs, and then we classified them. e application of differential privacy in deep learning is classified into three categories. e first category adds noise to the model gradient. On this basis, it discusses how to assign the privacy budget and how to add noise. e second type is based on the functional mechanism, adding noise to the objective function instead of its result. And the third is a new framework designed to support various models flexibly. It relies on the aggregation and noise of multiple teacher models to make decisions. We also pay attention to the application of differential privacy in Generative Adversarial Network: GANobfuscator and DPGAN. ey are implemented by adding noise to the discriminator, but their gradient clipping methods are different.
Although the application of differential privacy in deep learning is currently in its infancy, it is potential and many methods are worth exploring. Differential privacy can be widely applied in various scenarios that require privacy protection, such as recommendation systems, face recognition, and action recognition. In the future, differential privacy may be combined with federated learning and transfer learning or defend against adversarial attacks to improve the robustness of the model.

Conflicts of Interest
e authors declare that they have no conflicts of interest.