Federated Learning Model with Adaptive Differential Privacy Protection in Medical IoT

With the proliferation of intelligent services and applications authorized by artificial intelligence, the Internet of Things has penetrated into many aspects of our daily lives, and the medical field is no exception. The medical Internet of Things (MIoT) can be applied to wearable devices, remote diagnosis, mobile medical treatment, and remote monitoring. There is a large amount of medical information in the databases of various medical institutions. Nevertheless, due to the particularity of medical data, it is extremely related to personal privacy, and the data cannot be shared, resulting in data islands. Federated learning (FL), as a distributed collaborative artificial intelligence method, provides a solution. However, FL also involves multiple security and privacy issues. This paper proposes an adaptive Differential Privacy Federated Learning Medical IoT (DPFL-MIoT) model. Specifically, when the user updates the model locally, we propose a differential privacy federated learning deep neural network with adaptive gradient descent (DPFLAGD-DNN) algorithm, which can adaptively add noise to the model parameters according to the characteristics and gradient of the training data. Since privacy leaks often occur in downlink, we present differential privacy federated learning (DP-FL) algorithm where adaptive noise is added to the parameters when the server distributes the parameters. Our method effectively reduces the addition of unnecessary noise, and at the same time, the model has a good effect. Experimental results on real-world data show that our proposed algorithm can effectively protect data privacy.


Introduction
Big data-driven artificial intelligence (AI) has been applied to many aspects of our Internet of Things (IoT), and medical services [1,2] are one of the most potential applications of IoT. Tremendous advances in medical technology can make human health inspections more accurate, which is also very important for taking care of the health of patients and preventing the occurrence of various diseases. In the medical Internet of Things (MIoT), "things" include doctors, patients, people who care about health, medical devices, and medicines; "network" refers to the workflow of medical treatment and health management, the process of weaving "things" into an intelligent medical "net." Due to the growth of the symbiosis of machine learning (ML) and artificial intelligence, the value of the medical IoT is increasing. Nevertheless, due to the particularity of medical data, it is extremely related to personal privacy, and the data cannot be shared, resulting in data islands. Thus, the rapid growth of medical IoT applications requires a safe and reliable learning distributed system [3].
When in various medical IoT applications, distributed machine learning is the first choice for many data processing tasks. Federated learning (FL) is the latest development of distributed machine learning, which acquires and processes data locally on the client and then transmits the updated ML parameters to the central server for aggregation [4,5]. The goal of FL is to fit a model generated by the empirical risk minimization (ERM) objective.
Federated learning can be traced back to federated optimization to decouple and calculate collected data on a central server [4]. Federated optimization has recently been extended to deep learning platforms, which is called federated learning [6]. Today, federated learning has become a tool of choice in many engineering fields, such as data analysis [7], speech recognition [8], autonomous driving [9], and image processing [10]. In addition, deep learning has applications in many fields of healthcare [11][12][13].
However, training federated learning algorithms in the field of artificial intelligence requires a large number of data samples and the need to exchange data with other devices. From a technical point of view, the introduction of computing solutions in healthcare has also caused various problems. Deep learning requires a large number of data sets, and too little data will lead to underfitting or overfitting of the neural network model. However, the data used for training may contain personal private information, such as medical records, user files, and genetic information [14,15]. In addition, the training process of the model may also lead to the disclosure of private information. Attackers can use the correlation between the characteristics of sensitive data and model output to predict personal sensitive information based on the released model and some background information, which leads to an increase in the risk of private information leakage [16].
Designing a federated learning solution that meets privacy requirements is a challenge. The latest privacy protection scheme is mainly based on three encryption methods: secure multiparty computing (SMC), differential privacy (DP), and homomorphic encryption (HE). However, SMC is not a solution suitable for most MIoT application scenarios, because MIoT needs a noninteractive protocol to perform secure aggregation. Although HE is an effective method to prevent privacy leakage during training, the interactive decryption of HE seriously increases communication overhead. Especially for deep neural networks, a large number of hidden layers have a large number of parameters. Therefore, HE is not applicable to some medical IoT edge devices with limited computing power (such as smart bracelets and sensors).
In the past few years, many federated learning-based differential privacy has been extensively studied [17]. However, the existing methods always add the same noise in the gradient descent process, without considering the influence of data with different characteristics on the model output. In addition, privacy leaks often occur when the server distributes parameters to the model.
In order to effectively prevent information and solve the problem of data islands, this paper proposes an adaptive Differential Privacy Federated Learning Medical IoT (DPFL-MIoT) model. In this model, each client adaptively adds noise and performs local training parameters before uploading parameters to the server for aggregation. Furthermore, taking the privacy leakage of the downlink into account, noise is also added to the parameters transmitted by the server to the users participating in the training.
The main contributions of this paper are summarized as follows: (1) We propose an adaptive Differential Privacy Federated Learning Medical IoT (DPFL-MIoT) model, which can effectively protect user privacy data and achieve a satisfactory result (2) We present a different private federated learning deep neural network with adaptive gradient descent (DPFLAGD-DNN) algorithm based on layer-wise relevance propagation (LRP) [18]. DPFLAGD-DNN adds more noise to the features that are less relevant to the model output during training and adds less noise to the features that are more relevant to the model output. And in the stochastic gradient descent stage, the optimal step size is selected from the candidate set. When the server distributes the parameters, privacy leaks often occur, we propose differential privacy federated learning (DP-FL) algorithm, and adaptive noise is added to the parameters (3) Extensive experiments are conducted on real-world data sets to verify the accuracy of the proposed algorithm. The evaluation results show that our algorithm can protect privacy and has good results The remainder of this article is organized as follows. The second part is related work. The third part introduces the system model. The fourth part is background knowledge of federated learning and differential privacy and LRP algorithm. The fifth part analyzes the DPFLAGN-DNN algorithm we proposed in detail. The sixth part shows the experimental and simulation results. The seventh part is the summary of the paper.

Related Work
Combining different privacy protection methods with deep learning is a challenging task, and many scholars have conducted research on it.
In [6], McMahan et al. proposed a distributed training method that injects noise into the gradient to update the parameters and protect the private information of the neural network model. But in this method, the size of injected noise is accumulated in proportion to the number of training times and the number of parameters. Therefore, it may consume a large amount of privacy budget, because the number of training iterations and the number of parameters shared between multiple parties are usually very large.
Google first proposed the concept of federated learning [19]. In federated learning, participants store all training data locally, train the model locally, and then upload the parameters to the server to update the parameters. Other participants can download the updated parameters to their own devices to improve the accuracy of the local model.
A natural way to prevent information leakage is to add artificial noise, which is called differential privacy (DP) technique [17]. Considering the wide applicability of differential 2 Wireless Communications and Mobile Computing privacy in deep learning models, differential privacy can also be well used for privacy protection in federated learning. Existing DP-based learning algorithms include local differential privacy (LDP) [20][21][22] and DP-based distributed SGD [23]. In LDP, each client perturbs its information locally and only sends a random version to the server, thereby protecting the client and server from the leakage of private information. The work in [21] proposed a solution to establish an SGD compliant with the LDP standard, which provides impetus for various important ML tasks. The work in [22] considered the distributed estimation of the data uploaded by the client by the server and used LDP to protect these data. The work in [23] improved the calculation efficiency of DP-based SGD by tracking the detailed information of privacy loss and obtains an accurate estimate of the overall privacy loss. Abadi et al. [24] used a Gaussian mechanism to perturb the gradient and provide a more rigorous method (for example, time accounting) to track the entire privacy loss. Wang et al. [25] proposed to build an LDP-compliant SGD solution, which provides power for various important machine learning tasks. Wang et al. [26] considered the distributed estimation of the uploaded data by the server while using local differential privacy LDP to protect these data.
Committed to capturing the trade-off between privacy and aggregation performance during the training process, Geyer et al. [27] proposed an FL algorithm based on protecting customer privacy. The algorithm can obtain good training performance at a given privacy level, especially when there are enough participating clients. However, the abovementioned DP-based FL design work did not consider privacy protection in the parameter upload stage, that is, when uploading the training results to the server, the client's private information may be intercepted by hidden opponents. Similar to [27], Wei et al. [28] proposed a differential privacy federated learning framework, adding noise to the server and user local training at the same time, and detailed exploration of different privacy protection levels, the number of clients participating in training, and gradient clipping.

System Model
This section will propose a federated learning medical IoT based on differential privacy and layer-wise relevance propagation. Our DPFL-MIoT model is shown in Figure 1.

Model
Overview. It can be seen from Figure 1 that our model is mainly composed of four parts, namely, medical cloud server, medical institution, doctor, and user. The functions of these four parts are described in detail as follows.
(1) Medical Cloud Server. The medical cloud server is responsible for distributing model parameters and model aggregation.
(2) Medical Institutions. Medical institutions are hospitals. They have their own databases and IoT equipment. At the same time, they store a large number of patients' medical data. The protection of these data is extremely important for patients.
(3) Doctors. Doctors hold part of patient data and provide necessary disease diagnosis and treatment support.
(4) Users. Users may have many IoT devices, such as bracelets and mobile phones. These IoT devices are the carriers of user data in the model and also participate in the training of the model.

Model
Principle. Different from the traditional differential privacy federated learning framework, each client (entities participating in training) in our DPFL-MIoT adaptively adds noise by our proposed DPFLAGD-DNN algorithm, and considering the privacy leakage of the downlink, noise is also added to the parameters sent by the server to the users participating in the training in our proposed DP-FL algorithm. The detailed running process of the model is as follows: (1) Initialize Model Parameters. First, the underlying users establish communication with the medical cloud server. The medical cloud server distributes the initial parameters of the system according to the amount of data provided by medical institutions, doctors, and users for training the model.
(2) User Local Training. Each client performs model training according to the parameters obtained from the medical cloud server and adaptively adds noise by DPFLAGD-DNN. After the local training is over, the parameters are uploaded to the server for aggregation.
(3) Parameter Aggregation. The medical cloud server aggregates parameters according to the amount of data used by the users participating in the training to train the model by DP-FL. After parameter aggregation, the parameters are distributed to users.
(4) Repeat steps 2 and 3 until the model achieves an ideal effect

Preliminaries
In this part, we will introduce the background knowledge of federated learning, differential privacy, and LRP algorithm.
The notations used in this paper are listed in Table 1.

Federated
Learning. Federated learning [19] is an advanced distributed privacy protection machine learning technology that enables edge nodes to collaboratively train a shared global model without uploading private local data to a central server. Now consider a general FL system consisting of a server and N clients. D i represents the local data set held by client C i , where i ∈ f1, 2,⋯,Ng. On the server, the goal is to learn the data model retained on the N-related clients. When a client participates in local training, he needs to find a parameter w of the FL model to minimize a loss 3 Wireless Communications and Mobile Computing function. Formally, the server aggregates the weights received from N clients into Among them, w is the parameter vector trained on the first client, is the vector aggregated on the server side, is the number of clients, and here is the size of all data samples. Such an optimization problem can be formally defined as where F i ð⋅Þ is the loss function of the first client. Generally speaking, the local loss function F i ð⋅Þ is given by the local empirical risk. The training process of an FL system generally includes the following four steps: Step 1: local training. All participating customers calculate training gradients or parameters locally and then send the local training ML parameters to the server; Step 2: model aggregation. The server safely aggregates the parameters uploaded by the client without learning local information; Step 3: parameter broadcast. The server broadcasts the aggregated parameters to the client; Step 4: model update. All customers update the corresponding model with aggregated parameters and then test the performance of the updated model.
In the FL process, clients learn ML models collaboratively with the help of cloud servers. After a sufficient number of local training and update exchanges between the server and its related clients, the optimization formula (2) can converge to the solution of the global optimal learning model. Users participating in model training can use the model locally.
4.2. ðϵ, δÞ − DP. A condition of ðϵ, δÞ − DP [17] is that the data sets are adjacent data sets. If jðD \ D ′ Þ ∪ ðD ′ \ DÞj = 1, then the two data sets are adjacent data sets. In other words, D and D ′ at most differ by one record. Among them, ϵ > 0 is the privacy budget, and δ represents the slack term. For any given δ, ϵ is negatively correlated with noise. We will formally define DP as follows.

Wireless Communications and Mobile Computing
Definition 1 (ðε, δÞ-DP [17]). Given a random function M, if the output SðS ∈ RangeðKÞÞ of the function M satisfies the following inequality on the given adjacent data set, Then, the function M satisfies ðε, δÞ-DP. In (3), δ is the relaxation factor. Then, the random function M gives pure differential privacy. If δ > 0, M gives differential privacy. The former provides stronger privacy protection than the latter. ε used to balance privacy protection and data availability. The smaller the value, the higher the level of privacy protection, and the lower the data availability, vice versa. The implementation of differential privacy technology needs to add noise, which is closely related to the global sensitivity of the data set.
For numerical data, the Gaussian mechanism defined in [17] can be used to ensure ðε, δÞ-DP. According to the literature [17], we proposed the following DP mechanism by adding artificial Gaussian noise.
In order to ensure that the given noise distribution n~Nð0, σ 2 Þ is maintained ðε, δÞ-DP, where N represents the Gaussian distribution, for ε ∈ ð0, 1Þ, we choose the noise scale σ ≥ cΔs/ε and constant c ≥ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi 2 ln ð1:25/δÞ p . In this result, n is the value of the noise sample added to the data set, Δs is the sensitivity of the function given by Δs = max D i ,D i ′ ksðD i Þ − sðD i ′ Þk, and s is a real-valued function.
In view of the above DP mechanism, choosing an appropriate noise level is still an important research issue, which will affect the privacy protection level of the client and the convergence speed of the federated learning process.
There are some lemmas for differential privacy. In differential privacy deep learning, we can use the following lemmas: [29]. Any output calculation under differential privacy will not increase privacy loss. [29]. The serialized combination of the differential privacy mechanism still satisfies the differential privacy protection.

Layer-Wise Relevance Propagation (LRP).
Layer-wise relevance propagation (LRP) [18] is an algorithm designed to calculate the correlation between each input feature and the model output.
Definition 4 (Correlation decomposition) [18]. Given a hidden layer h 1 , h 2 , ⋯, h l , let R The correlation decomposition is given by the following formula: Among them, parameter 7 is a predefined stabilizer used to overcome the unboundedness of and correlation. In addition, it is the affine transformation of the neuron, which is defined as where a q is the value of neuron q, ω qp is the weight between neuron q and neuron p, and b p is a bias term.
In order to realize the backward propagation of the correlation, the correlation between the neurons in the last hidden layer and the model output should be derived. Given the output variable m, the calculation method of the correlation R l p ðx i Þ is as follows: By using formulas (4), (5), and (7), the correlation and input characteristics of each hidden neuron can be obtained [18]. The following equation holds: Among them, P x ij ðx i Þ is the correlation between the input feature x ij and the model output f x i ðωÞ.

Federated Learning with Adaptive Differential Privacy
The DPFLAGD-DNN proposed in this paper has the following two contributions: Different from the traditional framework, in terms of Laplacian mechanism and LRP algorithm, our framework is realized by adaptive perturbation gradient according to the correlation between different features and model output.
In the global DP-FL, the server sets different privacy budgets for the users according to the data provided by the users for training the model and adds reasonable noise to 5 Wireless Communications and Mobile Computing the parameters distributed to the users in the downlink. Therefore, DP-FL can solve the problem of data imbalance and is superior to traditional methods.

User Parameter Update.
In the user's local model update, we use noise disturbance to achieve differential privacy protection.
In order to design an SGD algorithm that satisfies differential privacy, a certain amount of noise is added to the gradient update process. However, adding the same noise to the gradient will affect the effect of the model. For this reason, we propose the DPFLAGD-DNN algorithm, which firstly adds noise adaptively according to the correlation between different input features and model output. Secondly, at the beginning of the loop, the loss function needs a lot of iterations before convergence, so adding noise at this time does not have a great influence on the direction of gradient descent.
However, as the optimization step proceeds, the global model gets closer and closer to the optimal value, and the direction of the gradient descent begins to become precise. At this time, the slight noise will have a great influence on the direction of noise reduction. Therefore, as the number of iterations increases, the noise should be reduced accordingly. At the same time, the noise reduction can also prevent users from running out of privacy budget, leading to premature launch of training.
Step 1 (lines 1 and 2). The parameter w is received from the server and the loop is initialized.
Step 2 (lines 3-6). Calculate the average correlation between the input feature j and the specific layer R j ðDÞ.
In order to guarantee R j ðDÞ ∈ ½0, 1, each R j ðDÞ is normalized to R j ðDÞ − γ/ðκ − γÞ, where κ and γ, respectively, represent the maximum and minimum values of the set fR 1 ðDÞ, R 2 ðDÞ, ⋯, R d ðDÞg. We propose a correlation ratio to adaptively add noise, so that features with less correlation with the model output are added more noise and vice versa. For the features j in a specific layer, we have The privacy budget is given by the formula below.
Step 3 (lines 7-10). Finally, add Laplace noise LapðΔ/ϵ j Þ to the input feature. Algorithm 1 is the pseudo code of the DPFL framework we proposed. Then, execute the gradient descent algorithm on the client where g represents the gradient of x in the t round of update and b is the batch size. Then, we clip the gradient Gradient clipping ensures that the second normal form of the gradient is limited to the range S. S is the global sensitivity.
Input: Training dataset, loss function LðθÞ, privacy budget ϵ, gradient normal clipping S, budget increase rate α Output: Weight parameters w t 1: Receive w from server 2: Initialization: t = 1, random 3: for j ∈ ½1, d do 4: Compute the average relevance R j ðDÞ

Wireless Communications and Mobile Computing
Finally, add Laplace noise Lap to the input feature.
Step 4 (lines [11][12][13][14][15][16][17][18]. We have customized a set Θ of step size. Each element is a loss function value, and the step size can be preset. Then, we use the LapNoise function (Algorithm 2) to select the best steps to resize. Perform gradient descent when the optimal step size is obtained.
Algorithm 2 introduces the LapNoise function. Taking candidate Θ, global sensitivity Δf , and privacy estimate ε as inputs, the algorithm returns the index i of the best step size. The function LapðΔf /εÞ represents a Laplace distribution with an average value of 0 and a scale parameter of Δ f /ε, where η is the learning rate, that is, the step size. Finally, the user's updated modulus parameter w t can be sent to the server.
Proof. Before the hidden layer node h in the neural network applies the activation function affine transformation output, the input of the node can be regarded as the linear input of the previous layer.
where b is the bias term and W is the parameter of a specific hidden layer h. Given the training batch L, h can be expressed as Given each hidden layer node h L ðW h Þ, we add Laplace noise to the bias term b and the input feature x i .
In lines 12 and 13, by adding adaptive noise, each input feature x ij of each hidden neuron in the hidden layer h L is perturbed. In lines 12 and 13, each input feature x ij of each hidden neuron h 0 in the layer is perturbed by adding an adaptive Laplacian noise 1/jLjLapðΔ h 0 /ε j Þ.
The bias term b in the neural network can be regarded as the 0th input feature in the parameter matrix, for example, it can be expressed as x i0 , and for each h L , it can be expressed Among them, we make Setting Δ h 0 is based on the maximum value of all input features x ij , and β j can be considered as the proportion of input feature j's contribution to hidden layer neuron h ∈ h L to Δ h 0 , where d is the number of features of each tuple x i ∈ D. All neurons h L in layer h have been disturbed, and the following formula can be obtained.
h L ðWÞ is the output.

Wireless Communications and Mobile Computing
Theorem 6. Algorithm 2 satisfies ε − differential privacy.
Proof. Privacy loss will accumulate in each iteration. We use the advanced differential privacy composition theorem [29] to track the privacy loss of each step of the gradient update. The composition theorem can provide a strict boundary for privacy loss. Therefore, according to Lemma 3, we only need to ensure that the privacy budget will not be exhausted. In each iteration, two operations will cause privacy loss: the "noisy" input feature (line 7), which we have shown above is consistent with differential privacy. At each gradient update, we will check whether this update will result in a privacy budget less than 0 (line 11). If a gradient update results in a privacy budget < 0, we will not perform this update. The third line controls the entire algorithm to meet different privacy. Therefore, we prove that Algorithm 2 satisfies ε − differential privacy.

Server Parameter Aggregation.
Algorithm 3 describes the entire framework of DP-FL in detail. Assuming that there are a total of K users participating in the training, in each iteration (communication between the server and the user), a random subset Z of size nðn ≤ KÞ is sampled. Only users belonging to Z send model parameters to the server.
UserUpdate function is the user's parameter update. T is the number of communications. The server averages the parameters uploaded by users belonging to Z. Then, the server only sends the model parameter w t to users belonging to Z.
In order to ensure that the downlink channel of the T round communication meets the ðε, δÞ − DP, the standard deviation of the Gaussian noise n D added by the server to the aggregation parameter w t can be as follows: In order to meet the requirements of the downlink channel ðε, σÞ − DP, the server needs to add additional noise. Under a certain L, the standard deviation of the additional noise depends on the relationship between the number of aggregations T and the number of clients K. Intuitively, the larger T is, the greater the possibility of information leakage. The larger number of clients can help hide its private information.
The users in each iteration are randomly selected from the total number of users K. The randomization process can reduce the model training time and increase the robustness of the model. Our solution to unbalanced data is to set a different ε for each user. The noise variance σ can be calculated from ε and δ. We set the same δ for each user. The Input: Number of users K, rounds of communication T, privacy parameters set fεg K k=0 , number of users participating in each epoch of communication n. Output: The weight parameters w T of the server model. //Server executes. 1: Random initialize w 0 2: for t = 0, 1, 2, ⋯, T do 3: C t ⟵ random select n users. 4: for k ∈ C t in parallel do 5: Send the updated parameters w k t+1 to Server 7: end for 8: w t+1 ⟵ w t + 1/nð∑ K k=1 Δw k k+1 Þ 9:w t+1 ⟵ w t+1 + n D 10: The server broadcasts global parameters to all participants 11: end for 12: returnw t+1 //Users execute 13: function UserUpdateðε, w t Þ 14:ŵ ⟵ w t 15: w =DPAGD-DNNðε, w t Þ 16: Δw t+1 = w −ŵ 17: return Δw t+1 Algorithm 3: Differential private federated learning (DP-FL). 8 Wireless Communications and Mobile Computing more data users use to participate in model training, the less noise is added to the input features.
Each user executes the model update algorithm locally and uploads the parameters to the server. The server averages the received parameters and sends the parameters to the user set Z. For each user, we use the DPFLAGD-DNN method to obtain local parameters. When a round of privacy budgets is exhausted, the risk of user data leakage has risen to a critical point. The user quits the joint learning training, and other users continue to train locally. We can guarantee that our DP-FL framework meets different privacy protections.
It is noting that there are many reasons why the accuracy of the model is not up to the standard, such as the amount of data involved in the training is too small, or the model is over fitted. Our method is to let users adjust parameters and increase the amount of data to continue training. 6. Experiment 6.1. Experimental Environment and Data Set. In this section, we evaluate the proposed DPFL by using neural networks and real-world federated data sets. All experiments are performed on a machine equipped with AMD Ryzen 7 5800H, 8 cores, and 16 threads, running on ubuntu 18.04. The neural network code is based on the PyTorch framework. In order to evaluate the accuracy of DPFL prediction, we use the model accuracy commonly used in deep learning as the evaluation index and change the privacy protection level, the total number of clients, and the maximum aggregation number through the control variable method. The reason why these variables are selected is that they have been widely used in other federal learning papers.
We use the Fashion-MNIST data set, which is widely used in the field of deep learning research, to evaluate the effect of the model. Fashion-MNIST is an image data set that replaces the MNIST handwritten digit set. It covers the positive pictures of a total of 70,000 different products from 10 categories. The size, format, and training set/test set division of Fashion-MNIST are exactly the same as the original MNIST, 60000/10000 training and test data division, 28 × 28 gray image.
Our baseline model uses a neural network with a single hidden layer containing 256 hidden units. In this feedforward neural network, we use the ReLU function as the activation function and use the SoftMax of 10 classes (corresponding to 10 classes). We set the learning rate to 0.002. We use multiple samples to evaluate this classification. Compared with many experiments exploring the effects of the DP-FL model, this setting meets the ideal conditions.
In order to reduce the experimental error without loss of generality, we will conduct each of the next experiments 10 times and take the average of the experimental results. And with the development of 5G technology, communication delay has become insignificant.

Experiment and Result Analysis
6.2.1. Protection Level Performance Evaluation. In Figure 2, we evaluate the accuracy of the model by selecting different privacy protection levels in DP-FL ε = 30, ε = 60, and ε = 90.
In addition, we also added a baseline model to compare with our experimental results. In this experiment, we set K = 50, T = 25, and δ = 0:01 and calculate the change of the model's prediction accuracy as the number of communications between the user and the server increases. As shown in Figure 2, the classification accuracy decreases We also conducted experiments on high-level privacy protection levels ε = 3, ε = 6, and ε = 9. Same as above, we set N = 50, T = 25, and δ = 0:01. From Figure 3, when we increase the level of privacy protection, the value of the classification accuracy in DP-FL decreases. Moreover, it can be seen that if the added privacy noise is too large, for example, ε = 3, the model classification accuracy curve will fluctuate with the number of global communications, obviously because too large privacy noise seriously affects the accuracy of the model's classification results.     Figure 5 shows the impact of the number of clients participating in the model training on the model performance under the conditions of ε = 30, δ = 0:01, and clip = 100. Similar to the real-world situation, as the number of users participating in model training increases, the model performs better and better on the test set. This is because more and more customers not only provide larger global data sets for training but also reduce the standard deviation of additive noise, so that when the user sends parameters to the server, the noise added is smaller.   Figure 6, we use the global number of communications Epoch (the number of aggregations) and the total number of clients participating in the training as independent variables and the classification accuracy as the dependent variable to explore the relationship between them. It can be seen from the three-dimensional histogram that the accuracy of the model is proportional to the number of global communications and the number of customers. One of the independent variables is fixed, and the other functional relationship is the curve described above.

Effects of Different Neural Network Optimizers of
Customers on the Effect of the Model. The use of optimizers in neural networks is a commonly used method in the field of deep learning research. The purpose of neural network learning is to find suitable parameters to make the value of the loss function as small as possible. We have added two commonly used optimizers Adam and Momentum to the algorithm for comparison with the traditional stochastic gradient descent (SGD). Figure 7 shows the comparison results of our optimizer experiment. It can be found that when the privacy budget is 3, 6, and 9, the number of clients client num = 30, the gradient clipping clip = 30, and the number of model iterations are 30 times, Adam and Momentum are both better than the SGD optimizer. For the name of the experimental result table, in the case of ε = 9, the model classification accuracy can reach 93.98%, so the Adam optimizer can be selected as the optimizer for the actual deployment of the differential privacy federated learning model.
We continue to use the controlled variable method to explore the relationship between the number of users participating in federated learning and the effect of the model. We set a common privacy budget ε = 6 and gradient clipping clip = 100; the number of iterations is 25, and we change the number of different clients client num = 10, 30, 50, and the experimental results are shown in Figure 8. It can be seen that the model has a better effect.

Conclusions
This paper proposes a differential privacy federated learning model with adaptive noise based on correlation analysis to protect the data privacy of multiple users in the medical Internet of Things. The algorithm we proposed provides two layers of protection for participating users, namely, adding noise locally to the user and adding noise to the server side. Our method can adaptively add noise to input features according to the correlation between different features and model output. Specifically, we add more noise to the neuron gradients that are less relevant to the model's output and inject less noise into the output-related features. In addition, we optimized the neural network stochastic gradient descent algorithm, which selects the optimal step size for gradient descent according to the privacy budget. We conducted detailed experiments on Fashion-MNIST. Experimental results show that our proposed algorithm can achieve high accuracy and has certain practical application value in the field of medical Internet of Things.

Data Availability
The data used to support the findings of this study are included within the article.

Conflicts of Interest
The authors declare that they have no conflicts of interest.

12
Wireless Communications and Mobile Computing