A Blockchain-Based Federated-Learning Framework for Defense against Backdoor Attacks

: Federated learning (FL) is a technique that involves multiple participants who update their local models with private data and aggregate these models using a central server. Unfortunately, central servers are prone to single-point failures during the aggregation process, which leads to data leakage and other problems. Although many studies have shown that a blockchain can solve the single-point failure of servers, blockchains cannot identify or mitigate the effect of backdoor attacks. Therefore, this paper proposes a blockchain-based FL framework for defense against backdoor attacks. The framework utilizes blockchains to record transactions in an immutable distributed ledger network and enables decentralized FL. Furthermore, by incorporating the reverse layer-wise relevance (RLR) aggregation strategy into the participant’s aggregation algorithm and adding gradient noise to limit the effectiveness of backdoor attacks, the accuracy of backdoor attacks is substantially reduced. Furthermore, we designed a new proof-of-stake mechanism that considers the historical stakes of participants and the accuracy for selecting the miners of the local model, thereby reducing the stake rewards of malicious participants and motivating them to upload honest model parameters. Our simulation results conﬁrm that, for 10% of malicious participants, the success rate of backdoor injection is reduced by nearly 90% compared to Vanilla FL, and the stake income of malicious devices is the lowest.


Introduction
Federated learning (FL) [1] is a technique that involves multiple participants who update their local models with private data and aggregate the models using a centralized server. However, the centralized server cannot detect the legitimacy of the local model. Thus, the centralized server is easily attacked by malicious participants, such as those implanting backdoors into the local model, causing incorrect predictions of classification results.
In FL, backdoor attacks are divided into attacker and defender attacks. Attackers can control one or more participants (malicious participants), whereas defenders are generally assumed to be participants rather than servers because servers would be easily infiltrated. In reality, only trusted servers can perform federated aggregation. Moreover, defenders are usually servers, but benign participants can also be regarded as defenders in some cases, particularly when attackers can only change the local training samples of participants but cannot modify their training process or trained model.
A blockchain is a distributed ledger with the fundamental characteristic of converting traditional centralized solutions into a distributed network structure [2,3]. This ensures data security on the blockchain through asymmetric encryption and other cryptographic technologies. At the same time, consensus mechanisms, smart contracts, and other mechanisms ensure the reliability of data on the blockchain, which is distributed among multiple participants.

1.
We use the blockchain to attain FL. Recording transactions in the immutable distributed ledger network improves the traceability, auditability, and tamper resistance of the joint model and avoids the single-point failures of the centralized server.

2.
We propose a new aggregation strategy where participants independently determine how aggregation is performed in the model, combined with reverse layer-wise relevance (RLR) [5], and further add gradient noise that limits the effectiveness of backdoor attacks.

3.
A new proof-of-stake consensus mechanism (PoSA) is designed to consider the historical stakes of participants and the accuracy of their local models. The PoSA mechanism reduces the stake rewards of malicious participants to motivate them to upload honest model parameters, thereby making the model-learning process more reliable and trustworthy.
This work may have considerable implications for future research and provide a practical solution for protecting privacy and security in FL.
This paper is organized as follows: In Section 2, we provide a summary of related work. Section 3 describes the proposed framework. We then discuss our experimental results in Section 4 and demonstrate the effectiveness of our approach in different environments. Finally, Section 5 provides our conclusions.

Related Work
FL is a distributed-learning paradigm that enables the centralized server to learn an accurate global model [5]. However, some participants in this process may be malicious, submitting malicious local models to the centralized server through backdoor attacks [6] and causing incorrect classification or decreased model accuracy after aggregation.
According to the object of the attack, this article divides backdoor attacks into two types: attacks on training data and attacks on local models. Attacks on training data are further divided into attacks based on label flipping and attacks based on planting triggers. Attacks based on label flipping do not modify the input data, only the labels, whereas attacks based on planting triggers modify both the input data and labels, effectively constructing an adversarial sample. Attacks on local models are divided into attacks based on modifications to the training process and attacks based on modifications to the trained model. The former occurs during the training process, whereas the latter mainly occurs after the model has been trained.
A blockchain is a distributed shared ledger where all participants record all historical transaction models; it is decentralized and immutable [2]. Using the blockchain, FL promotes traceability, auditability, and tamper resistance, making the model-learning process Electronics 2023, 12, 2500 3 of 15 more transparent and secure [7]. Therefore, many model verification methods based on the blockchain are applied in the FL framework.
Islam et al. presented an FL-based data accumulation scheme that combined drones and blockchains to attain secure accumulation and privacy of the model [8]. Zhang et al. designed a blockchain-based model migration approach to achieve secure model migration and speed up the training of the model while minimizing computation costs [9]. Rückel et al. proposed an FL system that combines a blockchain, local differential privacy, and zeroknowledge proof, using multivariate linear regression to achieve economic incentives, trust, and confidentiality requirements [10]. In another study, Dong et al. constructed a secure, reliable, decentralized, and federated learning system (FLock system) based on a blockchain to detect and block malicious participants through on-chain intelligent contracts while motivating participants to upload and review model parameters honestly [11]. Stephanie et al. presented a secure multi-party computation-based ensemble FL with a blockchain that enabled heterogeneous models to collaboratively learn from the data of healthcare institutions without violating users' privacy [12]. Wang et al. designed a new block structure, new transaction types, and a credit-based incentive mechanism (PF PoFL) that allowed for efficient model evaluation and utterly decentralized reward allocation [13].
Kalapaaking et al. proposed blockchain-based FL with SMPC model verification to detect and defend against malicious model updates while maintaining the privacy of the model [14]. BEAS was the first N-party FL framework based on a blockchain that provided strict privacy protection via improved gradient pruning for training models, which resulted in rigorous privacy protection. An abnormal detection protocol was proposed to reduce the risk of data poisoning attacks [7]. Baucas et al. proposed a platform using FL and private blockchain technology within a fog-IoT network, which can effectively preserve the privacy of patients and the integrity of the predictive service [15]. This paper uses difference privacy (DP) for privacy protection. Adding noise (which only requires the incorporation of pre-computed noise through an addition operation) is more effective than using complex cryptographic tools. Moreover, using DP and a PoSA consensus, we can guarantee the trust and privacy of the entire training process, which is impossible with the currently used techniques.
We compare DBFL against existing state-of-the-art frameworks for decentralized FL. DBFL uses a multi-channel permissioned blockchain to store all model gradients, which enables rapid scalability, auditability, transparency, and trust amongst collaborating entities. DBFL has over these approaches communication efficiency and easy-to-implement data privacy and security guarantees. Table 1 shows the comparative analysis of the proposed DBFL framework against other existing frameworks. Table 1. "P" is Participants; "A" is Aggregator; "I" is Inference; "T" is Training; "DPS" is Data Poisoning; "MPS" is Model Poisoning; "BA" is Byzantine Attack; "FSS" is Function Secret Sharing protocol; "SMPC" is multi-party computation; "DP" is Differential Privacy; "BC" is Blockchain; "IP" is Identity Privacy; "S" is Scalability; "AU" is Asynchronous Updates; "DPS" is Dynamic Participants; "D" is Decentralized Premature; "PC" is Premature Convergence; "RM" is Reward Mechanism; -denotes non-existent party; denotes honest party; ☼ denotes semi-honest party; Electronics 2023, 12, x FOR PEER REVIEW 4 of 15 Table 1. "P" is Participants; "A" is Aggregator; "I" is Inference; "T" is Training; "DPS" is Data Poisoning; "MPS" is Model Poisoning; "BA" is Byzantine Attack; "FSS" is Function Secret Sharing protocol; "SMPC" is multi-party computation; "DP" is Differential Privacy; "BC" is Blockchain; "IP" is Identity Privacy; "S" is Scalability; "AU" is Asynchronous Updates; "DPS" is Dynamic Participants; "D" is Decentralized Premature; "PC" is Premature Convergence; "RM" is Reward Mechanism;denotes non-existent party; ☾ denotes honest party; ☼ denotes semi-honest party; ☀ denotes dishonest party; × denotes does not provide property; √ denotes provides property.

Features
rk.
he round in node j ned by miner node j ode j

DBFL Operation Process
DBFL comprises a set of participants N = {N 1 , N 2 , . . . N m }, and similar to Vanilla FL, it executes the learning process through a series of communication R = {R 1 , R 2 , R 3 , . . .}. All N ∈ N are assigned the following tasks in R j : global model aggregation and local model update. All participants receive the winning block block j−1 from the previous round R j−1 and add it to their blockchains. Through the recorded local models in block j−1 , participants use the dynamic adaptive aggregation mechanism (Section 3.2) in round R j to build the global model G j . In R j , all participants perform local updates based on data samples train w used in training and the number of local training rounds R j . This results in local model gradients L w j , which are encrypted using DP (Section 3.3) to obtain L w j . The variable L w j and basic rewards r w j are packaged as tx w j ( L w j ), and finally, participants send tx w j ( L w j ) to the miner to which they are connected ( Table 2). Table 2. Symbolic representations used in our framework.

Symbols
Meaning The jth round of communication train w Node local private training dataset L w j Node j updates the local model in the R j round G j R j round global model update r w j Basic rewards obtained by blockchain node j r wm−veri j Verification signature reward obtained by miner node j r wm j Mining reward obtained by miner node j α j Accuracy ratio of blockchain winning miner node j model β j Historical Equity Ratio blockchain winning miner node j model block m j The blocks mined by the miner node j block j The blocks mined by the winning miner node j (.) Accuracy function (.) Historical stakes function ζ(.) Model aggregation method A random selection of a part of N wm from the participants is assigned the following tasks: (i) the verification and signature validation of the local models collected and (ii) the collection, aggregation, and mining of all local updates for the ultimate winning block. If the signature of transaction tx w j ( L w j ) is verified, N wm extracts L w j from tx w j ( L w j ). If the signature of tx w j ( L w j ) is not verified, N wm does not broadcast the L w j packaged in the unverified tx w j ( L w j ). Then, each miner N wm broadcasts tx w j ( L w j ) to all other miners. This ensures that each N wm has tx w j ( L w j ), and, thus, each N wm can access all L w j . At the same time, miner N wm receives a verification reward r wm−veri j by verifying the signature of a tx w j ( L w j ). If the signature is verified, N wm extracts L w j from tx w j ( L w j ). Then, all local updates { L w j } are collected and placed in the privately constructed block m j for all N w . The content of the block is collected by hashing it and signing it with the private key, which is equivalent to proof-of-work mining with a difficulty of zero. The candidate block also contains all expected rewards r w j , r wm−veri j , and r wm j . The participant with the highest score from N wm is selected as the best participant, and its constructed candidate block (block m j ) is published to the blockchain as the final legitimate block (block j ). All participants receive the winning block (block j ) for this round and add it to their blockchains [2]. Using the local models recorded in block j , participants use the dynamic adaptive aggregation mechanism to build a global model G j+1 for the next round of training.
Moreover, DBFL will first determine whether each participant N has successfully connected in each round ( Figure 1).

ℒ(. )
Accuracy function ℓ(. ) Historical stakes function (. ) Model aggregation method A random selection of a part of from the participants is assigned the following tasks: (i) the verification and signature validation of the local models collected and (ii) the collection, aggregation, and mining of all local updates for the ultimate winning block. If the signature of transaction ( ) is verified, extracts from ( ). If the signature of ( ) is not verified, does not broadcast the packaged in the unverified ( ).
Then, each miner broadcasts ( ) to all other miners. This ensures that each has ( ), and, thus, each can access all . At the same time, miner receives a verification reward by verifying the signature of a ( ).
If the signature is verified, extracts from ( ). Then, all local updates { } are collected and placed in the privately constructed for all . The content of the block is collected by hashing it and signing it with the private key, which is equivalent to proof-of-work mining with a difficulty of zero. The candidate block also contains all expected rewards , , and . The participant with the highest score from is selected as the best participant, and its constructed candidate block ( ) is published to the blockchain as the final legitimate block ( ). All participants receive the winning block ( ) for this round and add it to their blockchains [2]. Using the local models recorded in , participants use the dynamic adaptive aggregation mechanism to build a global model for the next round of training. Moreover, DBFL will first determine whether each participant has successfully connected in each round ( Figure 1).  The participant N i will not receive the blocks and contact other participants to obtain the lost blocks for aggregation but will not receive stakes.

Participant N i Failed to Connect during Local Training
Going online again will connect other participants to obtain lost blocks for aggregation but not receive stakes.

Participants N i Failed to Have Connected after Local Training
Going online again will connect other participants to obtain lost blocks for aggregation and gain stakes. When the miner suddenly goes offline, the participants associated with it will be processed according to Section 3.1.1.
In reality, each participant N can be offline at any time. However, in the comparative experiment, to test the effectiveness of backdoor attacks, we set the probability parameters T and waiting time (T w , T m ) always to be online and infinite waiting. [16,17] The gradient aggregation rule (GAR) aggregates gradients received from peers during each round. We designed the GAR to add robustness to the gradient generated by malicious participants who use backdoor attacks.

Dynamic Adaptive Aggregation Mechanism
To improve the privacy of the FL aggregation process, we propose a dynamic adaptive aggregation mechanism that overcomes the drawbacks of traditional aggregation algorithms, which perform a single, undifferentiated aggregation. The proposed mechanism can adapt to local circumstances and determine the aggregation algorithm autonomously (Algorithm 1). Check the legitimacy of the block j−1 5 If Verify( block j−1 )=True then 6 N adds block j−1 to its own blockchain 7 Take the gradient of each participant from the block j−1 : Adaptive selection aggregation method ζ(.) combined with RLR: Else 10 N refuses to accept the block j−1 11 End Furthermore, to defend against backdoor attacks, we specifically require the aggregation algorithm of each participant to include an RLR aggregation strategy. This strategy adjusts the learning rate during aggregation based on the symbol information updated by the participants in each global iteration. If the number of models updating in the same direction exceeds a certain threshold, the learning rate is updated to minimize the loss on that dimension [18]. Otherwise (i.e., if the process indicates an attacker is trying to guide the parameters to an incorrect classification), the learning rate is updated in the direction that would maximize the loss on the unwanted dimension by negating the learning rate of that dimension (i.e., by multiplying it by −1). We set hyperparameter θ to be the learning threshold. If the learning rate is multiplied by 1; otherwise, it is multiplied by −1.
In this paper, DBFL uses FedAvg with RLR [5] and coordinate-wise median (COMED) with RLR [5] as the GAR. See Appendix A for the relevant assumptions and theorems.

Differential Privacy [19]
During each round, participant N w exchanges its random gradient with all other participants in the blockchain. Specifically, each participant maintains a true random Electronics 2023, 12, 2500 7 of 15 gradient L w j and a perturbed gradient L w j that it wishes to share. The entire exchange process can be summarized in the following steps (Algorithm 2):

1.
Local gradient calculation. Calculate the local gradient L w j by sampling a random local dataset.

2.
Addition of noise. Add stochastic Gaussian noise to the shared local gradient L w j , with the noise variance represented by input variable ε.

3.
Broadcast the gradient. Transmit the perturbed local gradient ∼ L w j as a transaction to all other participants and receive the local gradients of other participants from the winning block.

Broadcast Gradients:
Package gradient L w j and r w j into a transaction tx w j ( L w j ), Broadcast tx w j ( L w j ) to the N wm ; 3 End

Threat Model
In FL, the training data is decentralized, and the aggregation server is only exposed to model updates. Given that, backdoor attacks are typically carried out by constructing malicious updates. The attacker tries to create an update that encodes the backdoor so that when the malicious update is aggregated with other updates, the aggregated model exhibits the backdoor. A prominent way of carrying backdoor attacks is through Trojans. A Trojan is a carefully crafted pattern that is leveraged to cause the desired misclassification. The toxic data are generated by (i) extracting all the base class instances that were constructed from the original validation data and (ii) adding backdoor patterns and relabeling them as the target class. In other words, the models with backdoors classify the base class examples that have backdoor patterns as the target class.
We assume that blockchain devices are rational [9]. These devices can evaluate their interests based on public information and maximize their benefits without performing operations that would harm them. Furthermore, we assume that blockchain devices are always on alert and do not trust each other.
We also assume that blockchain technology is credible. The blockchain is mainly maintained by devices with robust computation, storage, and communication capabilities and uses a consensus mechanism that is friendly to devices. Furthermore, most entities in the blockchain are reliable, and the records on the blockchain cannot be tampered with. Therefore, we can regard the blockchain as a trusted infrastructure, and we ignored attacks against it.

PoSA Blockchain Consensus
The PoSA blockchain consensus process deeply integrates the blockchain, PoSA consensus protocol [20], and GAR [16] aggregation functions. Specifically, the blockchain consensus includes two parts: an equity calculation and the choice of winners among the miners.

Calculation of Stakes
The PoSA consensus mechanism protects local model updates that are authorized for legitimate learning and ensures that these updates are recorded on the blockchain and used to update the global model. Because miners are responsible for aggregating local updates and recording them in a block, when a malicious device becomes a miner, it may attempt to disrupt the computation of the global model by placing false local updates and forged validator signatures in the blocks it mines. Therefore, avoiding blocks mined by malicious participants during selection is crucial for a robust blockchain FL.
Hence, inspired by the reward mechanism in VBFL [2] and reinforced by the roleswitching strategy, PoSA rewards devices according to the roles they play, with r being the unit reward.
The various types of rewards are described below. Basic reward. Participants that perform local updates in R j receive a proportional reward based on the number of data samples that train w used for training and the number of local training rounds R j (indicated by le w j ). The basic reward for participants in R j is calculated as follows: where r is the unit reward. To encourage participants to partake in the construction of the model, the basic reward accounts for 75% of the total profit generated during the model construction process. Signature verification reward. The participant's id is the public key, which is used to verify the signature of the transactions or blocks generated by the N wv . Participants partaking in mining tasks receive verification rewards r wm−veri j by verifying the signature of a local update transaction tx w j . Formula (2) is used for calculation.
Mining reward. Those participating in mining tasks in R j verify aggregated validator transactions {tx v j (l w j )} received from other participants and place them in a privately constructed winning block (block wm j ). After the block is published to the blockchain, participants receive mining rewards. Formula (3) is used for the calculation.

PoSA Miner Options
If workers actively train the model, PoSA rewards them with worker stakes to incentivize them to contribute substantial amounts of high-quality data and legally execute as many epochs as possible. Therefore, as the communication loop continues, the accumulated interest in the equipment can demonstrate its total contribution to the entire learning process. When selecting blocks for the global model update, PoSA instructs participants to select the block produced by the miner with the highest score in N wm . Because this miner makes the greatest contribution to the learning process, it is considered the most trustworthy with the lowest probability of blocking this process.
The participant weight calculation is divided into two parts: the model accuracy ratio and the historical state ratio. The model accuracy ratio is defined as We used a shared dataset (MNIST) to test the model accuracy of the blockchain miners so that all miners obtain the same accuracy through their own test set. The historical stake ratio is defined as (5) and the participant score is where the trade-off coefficient ω ∈ (0, 1). In the current implementation, there may be a turn composed of all malicious devices in N wm . Malicious devices may be selected as the winning miner because they have more rights than other legitimate miners in N wm . The role-switching strategy ensures that miners are randomly selected in each new round, reducing the probability that malicious devices are continuously assigned the miner's role. Furthermore, role switching can prevent "nondemocratic side effects" [21] (i.e., the winning miner does not continuously choose the equipment with the highest rights). Preventing these side effects can alleviate the risk of damage to the device and the possibility of attacks against the learning process. In Section 4, we validate the effectiveness of miner selection under the PoSA consensus.

Experiment
All experiments were implemented and completed using PyTorch [22] on a virtual machine, which used NVIDIA A100 GPU and Intel (R) Xeon (R) CPU @ 2.60GHz from Dell Computer in Urumqi, China. Multi-GPU training is not conducted in the paper experiments. We evaluate our framework on the public MNIST dataset.
The model architecture used in our experiment is from [18], a five-layer convolutional neural network consisting of about 1.2M parameters, which is composed of two convolutional layers followed by max-pooling layers and two fully connected layers.
The MNIST dataset is a subset of the NIST dataset. The training set contains a total of 60,000 images and labels, while the test set contains a total of 10,000 images and labels. Each image is a handwritten digital image with 28 * 28 pixels ranging from 0 to 9, with pixel values ranging from 0 to 255.
We report the hyperparameters of our experiments and briefly discuss our choices. We start with a setting where data is distributed in i.i.d. among participants (Table 3). Concretely, we use the MNIST dataset and give each participant an equal number of samples from the training data via uniform sampling. In all DBFL experiments, participants that performed the local update task received the most substantial stake rewards. Based on this, some participants were randomly selected for block-mining tasks. Each round of communication involved five local training iterations, a learning rate of 0.01, and a batch size of 128. We evaluated the performance of DBFL using the following standard metrics:

•
Test error: This is the percentage of incorrect predictions made using the test dataset.
We measured test errors based on rounds, network size, backdoor attacks, and privacy budgets. • Stake accumulation: We measured the accumulation of stakes by the participants in the blockchain.
The following subsections provide a detailed analysis of the different aspects of the DBFL experiments.

Network Size Convergence
For simplicity, we represented the FL scheme as "PURE" (i.e., it does not utilize any DP techniques, GAR, or blockchain systems), and used "DP" to represent the learning scheme that is solely based on the "PURE" DP techniques. We compared DBFL with the PURE and DP schemes under non-attacked conditions. As shown in Figure 2, the test error almost converged after 20 rounds, but the fluctuations were large when N was small (≤5). When N = 20, all schemes achieved almost the same convergence.
samples from the training data via uniform sampling. In all DBFL experiments, participants that performed the local update task received the most substantial stake rewards. Based on this, some participants were randomly selected for block-mining tasks. Each round of communication involved five local training iterations, a learning rate of 0.01, and a batch size of 128. We evaluated the performance of DBFL using the following standard metrics:

•
Test error: This is the percentage of incorrect predictions made using the test dataset.
We measured test errors based on rounds, network size, backdoor attacks, and privacy budgets. • Stake accumulation: We measured the accumulation of stakes by the participants in the blockchain.
The following subsections provide a detailed analysis of the different aspects of the DBFL experiments.

Network Size Convergence
For simplicity, we represented the FL scheme as "PURE" (i.e., it does not utilize any DP techniques, GAR, or blockchain systems), and used "DP" to represent the learning scheme that is solely based on the "PURE" DP techniques. We compared DBFL with the PURE and DP schemes under non-attacked conditions. As shown in Figure 2, the test error almost converged after 20 rounds, but the fluctuations were large when N was small (≤5). When N = 20, all schemes achieved almost the same convergence.

Privacy Budget [19]
We tested our DBFL scheme by setting ε to 0.001, 0.005, and 0.02. Among these, ε = 0.001 represents the most robust privacy protection. The results shown in Figure 3 indicate that when N = 10, the convergence of noise with ε equal to 0.001, 0.005, and 0.02 was similar. However, when N = 20, larger values may have led to greater testing errors. These results indicate that the trade-off between accuracy and privacy protection should be carefully adjusted based on the specific requirements of privacy protection and model accuracy.

Privacy Budget [19]
We tested our DBFL scheme by setting ε to 0.001, 0.005, and 0.02. Among these, ε = 0.001 represents the most robust privacy protection. The results shown in Figure 3 indicate that when N = 10, the convergence of noise with ε equal to 0.001, 0.005, and 0.02 was similar. However, when N = 20, larger values may have led to greater testing errors. These results indicate that the trade-off between accuracy and privacy protection should be carefully adjusted based on the specific requirements of privacy protection and model accuracy.
0.001 represents the most robust privacy protection. The results shown in Figure 3 indicate that when N = 10, the convergence of noise with ε equal to 0.001, 0.005, and 0.02 was similar. However, when N = 20, larger values may have led to greater testing errors. These results indicate that the trade-off between accuracy and privacy protection should be carefully adjusted based on the specific requirements of privacy protection and model accuracy.

Backdoor Attacks Caused by Malicious Participants
We provide empirical evidence to demonstrate the effectiveness of the defense capabilities of our framework. The general process is as follows: We simulate an FL network with 10 participants, with 10% of them being malicious. We randomly select one participant as the malicious participant and insert a backdoor into the training set labels, with label "5" causing the data to be misclassified as "7" (Figure 4). The test set of the malicious participant is also injected with backdoor data to compare the backdoor accuracy. The toxic data are generated by (i) extracting all the base class instances constructed using the original validation data and (ii) adding backdoor patterns and relabeling them as the target class [23]. In other words, the models with backdoors classify the base class examples that have backdoor patterns as the target class.

Backdoor Attacks Caused by Malicious Participants
We provide empirical evidence to demonstrate the effectiveness of the defense capabilities of our framework. The general process is as follows: We simulate an FL network with 10 participants, with 10% of them being malicious. We randomly select one participant as the malicious participant and insert a backdoor into the training set labels, with label "5" causing the data to be misclassified as "7" (Figure 4). The test set of the malicious participant is also injected with backdoor data to compare the backdoor accuracy. The toxic data are generated by (i) extracting all the base class instances constructed using the original validation data and (ii) adding backdoor patterns and relabeling them as the target class [23]. In other words, the models with backdoors classify the base class examples that have backdoor patterns as the target class.
We tested our DBFL scheme by setting ε to 0.001, 0.005, and 0.02. Among these, ε = 0.001 represents the most robust privacy protection. The results shown in Figure 3 indicate that when N = 10, the convergence of noise with ε equal to 0.001, 0.005, and 0.02 was similar. However, when N = 20, larger values may have led to greater testing errors. These results indicate that the trade-off between accuracy and privacy protection should be carefully adjusted based on the specific requirements of privacy protection and model accuracy.

Backdoor Attacks Caused by Malicious Participants
We provide empirical evidence to demonstrate the effectiveness of the defense capabilities of our framework. The general process is as follows: We simulate an FL network with 10 participants, with 10% of them being malicious. We randomly select one participant as the malicious participant and insert a backdoor into the training set labels, with label "5" causing the data to be misclassified as "7" (Figure 4). The test set of the malicious participant is also injected with backdoor data to compare the backdoor accuracy. The toxic data are generated by (i) extracting all the base class instances constructed using the original validation data and (ii) adding backdoor patterns and relabeling them as the target class [23]. In other words, the models with backdoors classify the base class examples that have backdoor patterns as the target class. . Examples before and after using Trojan horses in the MNIST dataset. The Trojan horse pattern is 5 × 5 with "@" placed in the bottom right corner of the object, and the backdoor task results in the model classifying Trojan horse "5" as "7".
After receiving and aggregating updates, we measure the two key performance indicators of the aggregation algorithm: validation accuracy (%), which represents the global accuracy of the validation set before backdoor injection, and backdoor accuracy (%), which represents the rate of success of relabeling data with backdoor patterns as the target class.
As shown in Table 4, DBFL1, DBFL2, and DBFL3 represent GAR using FedAvg with RLR and COMED with RLR as the model aggregation method but applying different ratios ((0.8, 0.2), (0.5, 0.5), and (0.2, 0.8), respectively). COMED with RLR performs better than FedAvg with RLR. Thus, as the percentage of participants using COMED with RLR as the aggregation strategy increases, the accuracy of the global model increases.

Effectiveness of PoSA
To evaluate the effectiveness of PoSA, we selected a case where 10% of the participants performed backdoor attacks [2,23]. We observed the accumulation of stakes for each participant. Taking DBFL3 as an example, we considered four cases that represented the proportion of historical stakes in the participant's score, while N represented the number of malicious participants that performed backdoor attacks. When only historical stakes were considered, the stakes of the malicious participants were not affected. When ω = 0.5 or 1, the stakes of the malicious participants were the lowest, indicating that the local model gradient of the participants is crucial when evaluating them.
The miner selection mechanism in PoSA is related to the miners' interests. Hence, we can evaluate the effectiveness of legal miner selection in PoSA by comparing the maliciousness of winning miners chosen based on model accuracy and historical stakes. For PoSA, we set the waiting time of the propagation block to infinite so that each miner could complete their block mining and receive propagation blocks from all other miners. The last propagation block was immediately determined as the legal block after the reception. The curve of accumulated stakes ( Figure 5) revealed that because our framework considers both historical stakes and the accuracy of the participants' models, the stakes of malicious participants grow slowly and remain the lowest during the round. Therefore, in the proposed framework, the blocks of malicious participants are less likely to be selected as the winning block.

Conclusions and Future Work
This article addresses how to effectively coordinate FL processes while maintaining learning security and user privacy [24]. We propose an FL framework that withstands backdoor attacks in a blockchain environment by incorporating an RLR aggregation strategy into the aggregation algorithm of the participant and adding gradient noise to limit

Conclusions and Future Work
This article addresses how to effectively coordinate FL processes while maintaining learning security and user privacy [24]. We propose an FL framework that withstands backdoor attacks in a blockchain environment by incorporating an RLR aggregation strategy into the aggregation algorithm of the participant and adding gradient noise to limit the effectiveness of backdoor attacks. This framework effectively minimizes the risk of backdoor attacks and enhances the robustness of FL against backdoor attacks. Our DBFL framework also implements various blockchain functions, such as signature verification and simulation of chain resynchronization.
The DBFL framework proposed in this article runs in simulation mode. Hence, the development of more effective blockchain data structures, chain resynchronization algorithms, and fault tolerance mechanisms is needed to test the performance of DBFL in actual distributed systems [2].

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
We now turn to deriving the convergence rate for full-batch FedAvg with RLR. Lew and ξ k is randomness caused by the local batch variability. We use E to denote expectation with respect to all random variables. Let g k be the gradient of the K th participant at the t th rounds, i.e., g t k = ∇ f K ω k t−1 , ξ t k , and E D k g t k F t = ∇ f K ω k t−1 , where F t is a filtration generated by all random variables at step t, i.e., a sequence of increasing σ-algebras F s ⊆ F t for all s < t.
Finally, following Bernstein et al. [25], we assume that for all t, k ∈ Z, each component of the stochastic gradient vector g t k has a unimodal distribution that satisfies populationweighted symmetry [26]. In particular, let W be a random variable symmetric around zero, i.e., P r (W ≤ −w) = P r (W ≥ w) for each w > 0. We now consider a family of asymmetric distributions which are constructed by distorting an arbitrary symmetric distribution with a scalar parameter β > 0 such that P r (W β = 0) = P r (W = 0) and for all w > 0P r (W β ≤ −w) = 2P r (W ≥ w)/(1 + β) and P r (W β ≥ w) = /2βP r (W ≥ w)/(1 + β), or equivalently for all w > 0.
Condition (A1) is referred to as population-weighted symmetry. For a case of β = 1, (A1) reduces to a standard symmetric distribution and corresponds to the assumption [25]. For β = 1, (A1) describes a class of asymmetric distributions [27]. As such, (A1) allows us to consider a broader class of distributions than distributions that are symmetric around the mean as in the case of Bernstein et al. [25].