Harnessing Dynamic Heterogeneous Redundancy to Empower Deep Learning Safety and Security

The rapid development of deep learning (DL) models has been accompanied by various safety and security challenges, such as adversarial attacks and backdoor attacks. By analyzing the current literature on attacks and defenses in DL, we find that the ongoing adaptation between attack and defense makes it impossible to completely resolve these issues. In this paper, we propose that this situation is caused by the inherent flaws of DL models, namely non-interpretability, non-recognizability, and non-identifiability. We refer to these issues as the Endogenous Safety and Security (ESS) problems. To mitigate the ESS problems in DL, we propose using the Dynamic Heterogeneous Redundant (DHR) architecture. We believe that introducing diversity is crucial for resolving the ESS problems. To validate the effectiveness of this approach, we conduct various case studies across multiple application domains of DL. Our experimental results confirm that constructing DL systems based on the DHR architecture is more effective than existing DL defense strategies.


Introduction
Deep learning (DL) models experience remarkable advancements in recent years, transforming various industries with their powerful capabilities, such as autonomous driving [1] and robotic surgery [2].Despite these breakthroughs, DL models are not without vulnerabilities.They are susceptible to sophisticated threats that can compromise their integrity and effectiveness, including adversarial attacks, backdoor attacks, and poisoning attacks.
Adversarial attacks [3,4] involve deliberately altering input data to deceive DL models into making incorrect decisions.Backdoor attacks [5,6] introduce specific trigger patterns during the training process, causing the model to produce incorrect outputs when these triggers are present during its operation.Poisoning attacks [7,8] compromise the training set by introducing malicious data, thereby degrading the model's overall performance or causing it to behave erratically.These attacks significantly threaten the safety and security of DL models.
there is a proliferation of AI applications based on deep learning, including natural language processing, machine translation, computer vision, and large-scale language generation models, etc.However, as deep learning technology is widely applied in the AI field, it also faces security threats, with the primary threats being adversarial attacks and backdoor attacks.Addressing the security issues of deep learning models is a highly challenging task.

Adversarial Attack
Adversarial attacks involve introducing subtle perturbations to the inputs of machine learning models, causing them to produce incorrect predictions with high confidence.The concept of adversarial samples, which are inputs intentionally altered to deceive the model, was first proposed by Szegedy et al. [17].These adversarial samples, which are created by making minor yet significant changes to clean data, pose a serious challenge to learning-based classifiers [18,19], especially in security-sensitive environments where model robustness is crucial.As research into adversarial attacks has advanced, a variety of attack methods have been identified.Attacks are generally classified into white-box and black-box categories based on the attacker's knowledge of the target model.White-box attacks, which assume complete access to the model's architecture and parameters, include techniques such as L-BFGS [20], which uses quasi-Newton optimization, FGSM [21], which employs gradient sign information, and iterative methods like BIM [19].The C&W attack [22] utilizes different optimization objectives to craft adversarial examples, while PGD [23] uses projected gradient descent, and JSMA [24] relies on the Jacobian matrix.In contrast, black-box attacks occur when the attacker lacks detailed knowledge of the target model and typically involve generating adversarial samples using substitute models or estimating decision boundaries without gradient information [25,26].As adversarial sample generation techniques become more sophisticated and harder to detect, the development of effective defense mechanisms has become increasingly important.Integrated attacks, such as AutoAttack [27], which combine multiple strategies, further complicate the defense landscape, highlighting the urgent need for robust and comprehensive security solutions in machine learning.

Backdoor Attack
A backdoor attack involves implanting a hidden entry point into deep learning model, allowing it to function normally for clean samples but exhibit specific abnormal behaviors when trigger is present.The first backdoor attack, known as BadNets [28], by adding triggers to part of training data and altering corresponding labels to the target category.This renders the model learn incorrect association between the trigger and the target category, resulting any sample containing the trigger being misclassified into the target category, while still exhibiting normal on clean samples.This approach has become the baseline for backdoor attacks in the field of computer vision.Later, more advanced backdoor attack methods are proposed.To evade human recognition, optical triggers imperceptible to the human eye are employed [29,30].Researchers also demonstrate that the success rate of backdoor attacks is closely related to the form of the backdoor trigger,therefore, finding a trigger is easy to learn for a specific model is also crucial.In light of this, Li et al. [31]raised a bilevel optimization based on the L p norm to optimize a trigger,this type of trigger is not only visually difficult to detect but also significantly enhances the effectiveness of the attack.Given that backdoor attacks based on data poisoning typically involve altering labels, inconsistencies between image content and labels are easy to be detected by human eyes.To address this issue,Turner et al. proposed a method called label consistency attack [32].Due to the fact that backdoor attacks behave normally on clean samples,it is difficult to discovered that a deep learning model has been implanted with a backdoor,which poses a significant threat to the security of deep learning model deployment.

Endogenous Safety and Security
Endogenous safety and security refer to structures or algorithms and their institutional mechanisms that have endogenous effects or endogenous safety or security effects [33].Endogeneity delineates an inherent effect engendered within a system, stemming autonomously rather than being contingent upon exogenous influences.Consequently, ESS denote the safety or security attributes derived through internal mechanisms encompassing system architecture, algorithms, mechanisms, or scenarios.[34].Any software, hardware, or algorithm inevitably harbors invisible side effects beyond their fundamental functions.Once triggered, these side effects can negatively impact the normal operation of the basic functions.Such effects are defined as ESS problems in cyberspace.To address ESS problems, current solutions involve the redesign of network architectures and incremental patching, such as those based on mimic defense [35] ESS solutions, trusted computing-based trusted network ESS solutions, zero-trust architecture-based ESS solutions, etc...Among these, the mimic defense-based ESS solution refers to normalizing the ESS threats of target objects into unknown disturbances that can be managed by reliability and robustness control theories and methods.Mimic defense employs conditional evasion methods to prevent attackers from forming effective attacks, ensuring that inevitable ESS problems do not escalate into systemic security threats.[36] Mimic defense, as a versatile security technology, is gradually being applied and commercialized.There is a plethora of applications of mimic defense, including cloud infrastructure [37,38], network slicing protection schemes, and blockchain security enhancement solutions.Mimic-based domain name servers, web servers, and other systems have already been deployed and put into operation.In the field of deep learning (DL) security, there have been several studies on ensemble models, emphasizing that model diversity enhances their robustness.For instance, the GAL [39] method, which is predicated on gradient diversity, the ADP [40] method, which relies on model behavior diversity, and the PDD [41] method, which is based on differentiated dropout.These methods are conceptually similar to ESS and have significantly improved the robustness of the model.As previously outlined, deep learning, despite its widespread applications, is presently confronted with substantial security challenges.Several methods proposed from the ESS perspective have achieved significant improvements in security performance across various fields.In the emerging domain of deep learning, existing research has demonstrated that introducing diversity has a notable effect on enhancing robustness.This approach aligns closely with the strategies used by ESS to address security issues.In this paper, we attempt to introduce ESS into the DL field by constructing a DL model system with an endogenous security architecture to address the issue of low robustness in single models.

ESS Problems in DL
The safety and security of DL models are crucial to their widespread application, especially in safetycritical systems.As described in Section 2, we attempt to address DL issues from the perspective of ESS.This paper focuses on the ESS properties of DL, the definition of which is given below.Definition 1. Endogenous Safety and Security (ESS) of Deep Learning (DL) refers to the safety and security functions or properties that DL obtain through their inherent factors, such as model architecture, learning algorithms, and processing mechanisms.
As defined in Definition 1, ESS in DL concerns only the safety and security that stem directly from the DL itself, not those that arise from the environment, such as the application in which the model is used.For example, a DL model might be robust to adversarial perturbations added to the input.The robustness is considered an ESS property of DL, as it can be enhanced by advanced training strategies like adversarial training [9].In contrast, the legality of DL is not an ESS property, as legality depends on whether the use of DL models complies with laws and regulations, which are extrinsic factors derived from human morality and ethics.Correspondingly, ESS problems in DL have the following definition.problems tend to be relatively trivial and overly specific.As shown in Figure 1, the existing safety and security threats to DL that have been identified include adversarial attacks, backdoor attacks, DeepFakes, poisoning attacks, privacy disclosures, among others.Current research typically focuses on one specific safety or security problem but overlooks the relationships and distinctions between them.To bridge this gap, in addition to the categorization of ESS and non-ESS problems, this paper proposes to further divide the ESS problems into individual and common problems.As shown in Figure 2, individual problems pertain to problems within DL algorithms, whereas common problems relate to the operational environment of DL.

Individual Problems
The ESS individual problems are attributed to the 'genetic defects' of DL algorithms.By delving into current research on DL safety and security, we have identified that these genetic defects manifest as 'three inabilities' in DL algorithms.These inabilities, which represent structural contradictions within DL models, include Non-Interpretability, Non-Recognizability, and Non-Identifiability.
Non-Interpretability Due to the black-box nature of the learning and inference processes of DL, DL models are often considered non-interpretable.This has given rise to Explainable Artificial Intelligence (XAI) research studies [42] focused on DL.Nevertheless, to date, the process by which DL learns knowledge and rules from training data remains unclear.The internal learning process of DL is considered a black box, difficult to accurately describe and even harder for humans to understand.This complexity makes it challenging to locate safety and security problems in DL, as the models are based on data-driven training and fitting mechanisms.Problems may arise concerning the authenticity and completeness of the data, the robustness and generalizability of the model, among other aspects, but pinpointing these problems can be quite challenging.
Non-Recognizability The non-recognizability of DL models is attributed to the data-driven learning framework of current DL techniques.DL models make predictions based on the training data on which they are fitted.Consequently, the quality and source of the training data can greatly influence the models' outputs.DL models lack the ability to recognize whether an output is correct or incorrect, or whether it is fair or biased.This misalignment between the ethical standards of DL models and those of human beings complicates the assessment of models output contents.For example, after going online, Microsoft's chatbot was fed a large amount of inappropriate data and became 'corrupted' within 16 hours, continuously emitting profanities.This incident highlights the challenge of ensuring that DL model outputs align with human values across diverse cultural, ethnic, educational, and cognitive backgrounds.Currently, there is no effective technical solution to this problem.
Non-Identifiability DL models excel at inductive reasoning, deriving patterns from known data.However, they typically struggle to understand and make judgments about unfamiliar, unexperienced phenomena and are even less capable of predicting and reasoning about medium to long-term future changes.The reason behind this is that DL models receive more information and knowledge than we have, but do not actually generate new knowledge.DL is still limited to extracting knowledge patterns from known data.In terms of dynamic knowledge, unknowns, and other aspects, there is still a gap compared to the human ability to draw inferences from a single instance.This also indicates that the safety and security problems of DL cannot be addressed by DL alone, as the deduction contains a philosophical logical contradiction.DL cannot foresee or infer the existence of uncertain security threats based on existing knowledge.

Common Problems
Common problems refer to issues in DL systems that, like other information systems, arise from external dependencies on devices and environments.Like other application systems, the entity of an AI application system relies on physical information systems, so its algorithm model "base" is bound to face common ESS problems.Domestic and foreign research reports show that there are widespread security vulnerabilities in the software and hardware environments on which the mainstream deep learning framework relies.Once these vulnerabilities are exploited by attackers, AI systems will face the risk of destruction, tampering, and information theft.In terms of software, current foreign platforms such as TensorFlow, Torch, and Caffe have all been reported to have security vulnerabilities.According to the data of GitHub, an open-source software community, since 2020, Tensorflow has been exposed to more than a hundred security vulnerabilities, which can lead to system instability, data leakage, memory corruption and other problems.In 2021, the "360 Company" conducted a security evaluation of mainstream open-source AI frameworks at home and abroad, and found more than 150 vulnerabilities in 7 machine learning frameworks (including the currently most widely used Tensorflow, PyTorch, etc.), and more than 200 vulnerabilities in the framework supply chain.This finding was confirmed by the DoS attacks, evasion attacks and system downtime in TensorFlow, Caffe and Torch exposed in 2017.A Tencent security team also found that there were major vulnerabilities in a TensorFlow component, which can make the robot program written by developers based on this component easily controlled remotely by hackers.
In terms of hardware, the GPU hardware products that AI systems mainly rely on also have security vulnerabilities.Among them, the most severe ones are the "Meltdown" and "Specter" vulnerabilities exposed in 2018, which affected multiple series of products including GeForce, Tesla, Grid, NVS and Quadro, basically covering most of the product lines of NVIDIA, a mainstream GPU manufacturer.In the same year, researchers at the University of California, Riverside, targeted security vulnerabilities in NVIDIA GPUs [26] and discovered three methods that could be used by hackers to breach user security and privacy.In addition, research shows that the neural network model can be destroyed through GPU/CPU overflow, which makes the model invalid or become a backdoor network.(Guo) 4 Addressing ESS Problems in DL 4.1 Diversity Promoting ESS (Xi) Due to the uncertainty inherent in deep learning systems, no single deep learning model can be easily trusted.Furthermore, this inherent uncertainty within deep learning models themselves renders security enhancements targeted solely at a single model unreliable.This situation prompts us to consider whether security can be achieved through system construction in application systems.In the absence of relying on (incorporating) prior knowledge (libraries), if a model or constructive mechanism can transform uncertain disturbances within the target object's environment into controlled probability differentialmode or common-mode properties of ESS events.Indeed, this implies that such a system can operate securely even when incorporating unreliable models.To realize such an application system, we advocate for the integration of diversity in system design.By leveraging diverse models, we can utilize the differentialmode outputs to mask errors inherent in any single model.
Diversity refers to the simultaneous integration of various models within the system.The diverse models integrated within the system exhibit identical functionalities during normal input-output operations.However, when faced with anomalies or attacks, they can generate a differential-mode response, leading to noticeable disparities in the results.The application system uses fusion algorithms designed to arbitrate the final output.This ensures that even if a single model experiences a security incident resulting in abnormal output, the overall system output remains correct.
To analyze the significance of diversity, we introduce the notion of a safety space.Using classification tasks as an example, the objective in enhancing a model's robustness is to improve its capability to accurately classify perturbed samples.Define the "safety space" S f for a model f and an input x as follows: This space represents the range of perturbations ρ that can be applied to x and the model's output remains consistent with the true label l.In other words, if a perturbation ρ ∈ S f , then for the perturbed sample x+ρ, the model still outputs the correct label, i.e., f (x+ρ) = l.However, perturbations that fall outside of this safety space can lead to misclassifications.The robustness of a model is therefore tied to the size and integrity of this safety space, with larger spaces indicating greater robustness to perturbations.
The safety space for ensemble models, S F , is jointly determined by its sub-models.Let F be an ensemble model with n sub-models.The outputs through output summation averaging can be described as below: According to Equation (1), the safety space for ensemble model F can be defined as follows: Page 7 of 28

F o r p e e r -r e v i e w o n l y --S & S
where l f denotes the incorrect outputs with the highest sum of probabilities, and P (•) signifies the prediction probabilities.In essence, if an attack ρ does not lead a majority of models to converge on the same incorrect result, the integrated system is unlikely to produce an error.
For the same input, the ensemble of multiple models involves overlaying the safety spaces of those models.As illustrated in Figure 3, if there are three models with safety spaces S f 1 , S f 2 , and S f 3 , respectively, the overlay of their safety spaces yields four distinct regions: S 1 , S 2 , S 3 .The safety space S F of the ensemble model is indeed formed by the intersection of the safety spaces S 1 , S 2 , S 3 of the sub-models.As depicted in Figure 3, the overlap of S 1 , S 2 , and S 3 notably expands the secure space.It can be seen that under this analysis, multiple models can complement each other to adapt to attacks, thereby enhancing robustness.The core point is how to ensure that the coverage of the safety space among the models is sufficiently large, and that the results within the coverage satisfy equation (2).Based on this analysis, we propose introducing necessary diversity by constructing a Dynamic Heterogeneous Redundancy Architecture(DHR) to enhance the system's robustness.

Enhancing DL Models through DHR
AI application systems face ESS problems at two levels: the "base" of software/hardware environments and the "ontology" of model algorithms, so the threats and challenges they face are more severe.In the aspect of common problems, through the practical development of ESS in cloud platforms, storage systems, routing and switching and other network devices in recent years, the "base" environments of AI application systems such as information communication networks, clouds and data centers have the ESS attributes of trusted services, which provide a feasible solution to the common ESS problems of AI.With respect to individual problems, the author takes DNN, a hot topic at present, as an example for research and discussion.

Motivation for the Enhancing Design
According to the analysis above, the neural network of AI is a feature engineering based on gradient optimization for fitting.This process makes the existing models pay attention to all the features instead of ignoring the subtle features as human beings do.Currently, both black-box and white-box adversarial attacks fundamentally rely on the idea of creating deceptive samples to deceive models.This is done by employing various optimization strategies, such as gradient descent on model loss, while ensuring minimal interference with microscopic features.Just as the vulnerabilities and backdoors in the software/hardware of an information system cannot be predicted or exhausted in advance, the optimization methods adopted by neural networks at present can only try their best to approach but can never achieve the perfect goal of "understanding" everything when the training set is limited.This problem is an architectural defect of the neural network itself and can be regarded as a vulnerability in the neural network model algorithm.
From this point of view, adversarial attacks, as the most representative individual ESS problem of AI, are similar to common ESS problems in terms of root causes, presentation forms, and exploitation methods, and it seems that they can be defended against in an integrated manner from the perspective of ESS.From the perspective of the ESS defense paradigm, the feasibility of implementing security protection at the algorithm level of the DNN model based on the dynamics, variety, redundancy of the DHR architecture and the SR-FC mechanism mainly includes the following two aspects: (1) The adversarial performance of each functional equivalent conforms to the Relatively-true Axiom.It is found that the correct classification boundaries of networks with different structures are similar when searching for gradients in different directions at the same decision point [34,35], as a result of which the same disturbance can make different models go wrong, that is, adversarial attacks are migratable.At the same time, however, the direction of the gradient descent of each model is extremely random, which leads to different wrong results, so it is difficult to achieve a migratable attack on a specific target.Therefore, the effectiveness of diversity and heterogeneity of neural network sub-models in DHR is guaranteed.
(2) The input and output interfaces of functionally equivalent reconfigurable executors can be normalized or standardized.For a neural network with homogeneity, the input data to be identified or classified is its normalized input, and the result of identification or classification is its normalized output.On this interface, under the excitation of a given input sequence, neural network sub-model executors with functional equivalence have the same probability of multiple output vectors or states, which makes it possible to judge and ensure the equivalence among sub-model executors through the consistency test method of a given function or performance.For heterogeneous models, normalization methods need to be further studied in the future.However, based on the final conclusions derived from inputs and outputs, theoretically, the output targets can be normalized through the transformation of outputs.Based on the above, we believe that AI application systems also have the feasibility to construct architectures with ESS characteristics.The paper explores the use of the DHR architecture to modify AI application systems.By leveraging diversity to generate endogenous security effects, the approach aims to enhance the overall system's robustness.The ultimate goal is to create a robust system that can maintain resilience even when the robustness of individual models is insufficient.

An Design of ESS AI Defense Framework
The DHR architecture has been proven in practice to be an effective approach for in-scope security.Figure 4 shows the DHR-based ESS defense framework of AI, where multiple functionally equivalent neural network sub-models are used to construct a heterogeneous redundant operating environment, the input agent distributes samples to each sub-model for independent processing, and the identification or classification results enter the strategic ruling.For normal samples, each sub-model can give the same or similar results.For adversarial samples, the sub-model will be triggered to produce differential-mode output, so they are very likely to be discovered by the ruling module, the error correction output link and system scheduling module will be activated, and the algorithm model will be dynamically replaced according to certain rules, thus avoiding the current adversarial attack.
Under the above defense framework, how to mine and construct effective diversity for neural networks becomes the key.The core elements of neural networks include the dataset, the network model, and the training method, all of which can be used to construct the entry points of heterogeneous sub-models.What needs to be noted is that due to the similarity in the internal mechanisms of the trained models and methods, even if there are differences in the model structure, the characteristics and methods of learning are still similar, so the same adversarial sample can make different models go wrong, that is, the problem of target migratability still exists.Therefore, if we want to achieve system-level robustness from dynamics and unknowns, we need to conduct in-depth research and experiments to further explore how to obtain differentiated neural network models.The core component of constructing the DHR (Diverse Heterogeneous Redundancy) architecture is the heterogeneity of its sub-executors, i.e., the introduction of necessary diversity.To validate the feasibility of this architecture, we subsequently conducted various case studies on diversity verification, which are described in more detail in section 5.

F o r p e e r -r e v i e w o n l y --S & S
Author A et al.: This is Article Title

Case study
To validate the feasibility of our approach, we attempt various methods in diversity construction.We set up four scenarios for testing: adversarial defense, backdoor defense, poisoning defense, and real-world application scenarios.For each different scenario, we select typical tasks within each scenario, such as image classification, object detection, etc.By comparing the performance of individual models, we find that model diversity can significantly enhance the robustness of systems.

Adversarial Defence
An adversarial attack is a technique employed to deliberately manipulate or deceive machine learning models by introducing carefully crafted perturbations into input data.These perturbations are often imperceptible to humans but can cause the model to misclassify or produce erroneous outputs.Adversarial attacks can be categorized into various types, such as white-box attacks, where the attacker has full knowledge of the target model, and black-box attacks, where the attacker has limited or no information about the target model.Adversarial attacks pose significant challenges to the robustness and security of machine learning systems, as they can undermine the reliability of models in real-world applications.Adversarial defense refers to methods and techniques aimed at protecting machine learning models from adversarial attacks.These defenses seek to enhance the robustness and resilience of models against adversarial perturbations introduced into input data.Adversarial defense methods can be broadly classified into two categories: Adaptive Defenses: Adaptive Defenses dynamically adjust the model's parameters or architecture in response to detected adversarial attacks, such as adversarial training and gradient masking.Adversarial training is a powerful technique that enhances a model's robustness by exposing it to both pristine and adversarially perturbed data.This method helps the model learn to generalize and resist potential attacks.Another effective strategy is gradient masking, which conceals gradient information to thwart attackers from crafting potent adversarial examples.By obfuscating the gradients, models become less vulnerable to adversarial manipulations.
Detective defenses: Detective defenses are designed to identify and neutralize adversarial examples before they inflict damage.These techniques employ a variety of strategies to ensure the integrity of the model's predictions.Anomaly detection is one such method, which scrutinizes data for anomalies that deviate from typical patterns, thereby flagging them as potential adversarial threats.Another approach is adversarial sample detection, which meticulously analyzes input samples to identify those that provoke atypical or unpredictable responses from the model.By implementing these proactive measures, detective defenses fortify the system against malicious attempts to compromise its performance.
As previously outlined,the predominant strategy for thwarting adversarial attacks has been applied supplementary defenses and strengthen individual models.However, we hold the opinion that an exclusive reliance on these supplementary measures and the model's inherent defenses does not provide a foolproof security solution.Our objective is to develop the DHR architecture, which is intended to significantly improve the model's security posture.We have made attempts from the following perspectives.

Defending against Adversarial Attack through Preprocessing Diversity
Methods.In this approach, we enhance the robustness of machine learning models by promoting diversity in data representation.By applying different transformation techniques to the same dataset, each model is exposed to uniquely represented data, fostering a variety of insights and capabilities.This method makes it challenging for adversarial perturbations to transfer effectively between models, thereby increasing system resilience.Through different transformers, multiple TF data sets are derived from raw data, which are then used to train various TF models, thereby constructing an ensemble model system.The specific process is shown in Figure 5.  Experimental Setup.Emphasizing both representativeness and simplicity, we focused on a classic image classification task for our evaluation.We transformed data from the same source dataset in various ways to ensure that the task objectives remained consistent, maintaining label-sample correspondence.This uniformity aids in achieving comparable classification goals.We selected different image processing techniques to highlight unique features, employing methods such as the Canny [44], LBP [45] and GLCM [46], which uses seven different statistical methods for image transformation.The images, drawn from a subset of the ImageNet dataset, include five categories totaling around 10,000 images.We conducted training using the standard RESNET18 model with images normalized to 224×224 pixels.
Results. Figure 6 illustrates that as perturbation size increases, the accuracy of the attacked model declines sharply, while models using different transformations (LBP, CANNY, GLCM) respond variably to the same perturbations.Notably, Table 1 illustrates that when one model's accuracy dips below 0.01, others still can maintain over 0.9 accuracy, highlighting resilience against transferred perturbations.This indicates that by using different transformation methods, we create diversity in data processing, thereby constructing a DHR-like architecture system that significantly enhances the model's resistance to adversarial attacks.Experimental Setup.We utilized the ImageNet100 dataset to train a RESNET18 network for the purpose of classifying images into 100 distinct classes.Concurrently, we conducted a comparative analysis with existing research on ensemble model diversity.Specifically, we examined the ADP method, which leverages behavioral diversity, and the DEG method, which is rooted in gradient diversity.Our evaluation encompassed three attack methodologies, comprising two white-box attack strategies and one black-box approach, to assess transferability.During testing, we subjected one of the models to attacks and evaluated accuracy by aggregating the outputs of all three models.The constraints applied to the attacks are denoted as 'para' in the corresponding Table 2. Notably, the PGD and BIM methods underwent 10 iterations each, while the SPSA method was iterated 5 times.
Results. Figure 7 compares weight distributions across sub-models within an ensemble model, showcasing varying degrees of concentration.While the ADP method exhibits a slight tendency towards concentration, our proposed method demonstrates a more pronounced divergence, with primary weight values concentrated on a minority of nodes, underscoring its distinctive impact on model weight distribution.Based on the presented Table 2, it's clear that adversarial examples tailored for individual models have severely impaired their performance, rendering them nearly ineffective.In contrast, our proposed method demonstrates superior identification capability when compared to the DEG and ADP methods.All diversity-enhancing techniques show a significant enhancement in recognition accuracy.By leveraging diversity in weights, we have successfully enhanced the robustness of the model, constructing a DHR system.

Defending against Adversarial Attack through Data Diversity
Methods.We propose a diversity training method for large datasets to regulate learning data diversity without incurring additional training costs.This method [43] leverages model feedback to control diversity, with four regularization operations tailored to specific categories: enhancing model performance (EMP), enhancing model divergence (EMD), enhancing single individuals (ESI), and enhancing error disagreement (EED).Our empirical evaluation demonstrates that this approach significantly boosts model robustness while having minimal impact on performance.We refer to this method as enhancing adversarial robustness through diversity supporting robustness (EADSR) Experimental Setup.We still utilized the ImageNet100 dataset and selected two different configuration parameters: 0.03 and 0.1, representing the proportion of perturbation introduced by additional data.We tested five attack methods: FGSM, BIM, PGD, APGD, and Auto Attack (AA).We employed a stricter attack strategy by treating multiple models as a single entity for simultaneous attack, causing gradients of all models to decrease simultaneously.We also compared our approach with previous methods that attempted to use multi-model defense against attacks, namely ADP and PDD.The ADP method enhances the robustness of ensemble models by increasing the behavioral diversity among the individual models, while the PDD method improves diversity by differentiating the dropout strategies applied to the final layer of each model.Recognition accuracy (%) under white-box attacks with control parameter (Para.)as ϵ in L∞ for FGSM, BIM ,PGD A-PGD, and Auto Attack Note: ensemble size is K = 3 and optimal performance is marked in bold.[43] Result.As depicted in Table 3, reducing perturbation intensity drastically reduces the performance of attacked models on smaller datasets.EADSR also shows decreased performance under attacks but maintains significant robustness gains without substantial performance decline.Compared to existing diversity methods, EADSR exhibits notable accuracy improvements against iterative attacks, albeit with a slight performance impact (a 4% decrease) when perturbation strength is increased by 0.1.By leveraging the differences in data and training methods, the DHR system we built significantly enhances the robustness of the model.

Backdoor Defence
A Backdoor attack is to implant a backdoor into a model, making the model particularly sensitive to a certain trigger and exhibiting specific abnormal behaviors when the trigger is present.In the field of image classification, the purpose of backdoor attacks is to make the model perform normally on clean samples but classify any sample added with the trigger to a specified wrong class.Since backdoor models behavior normally on clean samples, backdoor attacks possess excellent stealthiness.And there is significant flexibility in the selection of attack targets, triggers, and attack methods.Backdoor attacks on DNNs raise significant security concerns for the application of artificial intelligence.
Researchers are actively working on developing defense mechanisms to protect models against such attacks,the defense strategies against backdoors mainly include three aspects: Data-level:To defend against backdoor attacks based on data poisoning,data filtering is conducted first to remove any abnormal data, thus preventing the model from being implanted with backdoor.A typical method is PatchSearch [47],which utilizing Grad-cam [48] to locate the suspicious patch,but this method can only be applied to patch-form triggers and can't detect triggers such as optical trigger.
Train-level:To defend against backdoor attacks based on data poisoning, its main idea is to train a clean model on poisoned dataset by manipulating the training process.One typical method is ABL [49],it employs a strategy of gradient ascent on poisoned samples, thereby preventing the implantation of backdoor.The shortcoming of this method is also evident, it may misjudge some high-quality normal samples as poisoned samples, thereby affecting the model's performance.
Model-level:To remove backdoor from a trained model that may have been implanted with a backdoor.The general process can be further divided into backdoor detection and backdoor removal.In this scenario,the defender often has only a limited amount of training data.A typical method is SSL Cleanse, which first decide whether the model is attacked and identifies the attack target, then fine-tunes the model to remove the backdoor.However, this method can only address single-target attacks and can't address multi-target attacks.
Existing methods such as data filtering, backdoor detection, backdoor removal cannot completely prevent model from backdoor attacks.Thus, we turn to DHR, hope to defend against backdoor attacks Page 14 of 28  by adopting a DHR architecture,whose main idea is diversity.We have made attempts from the following perspectives.

Defending against backdoor attacks through Weight Diversity
Method.Self-supervised learning [50][51][52]aims to train a image encoder on unlabeled dataset, which is then utilized to train a downstream classifier.A backdoored encoder will produce features of samples added with the trigger close to the target class [53],which then result any sample embedded with the trigger classified to the target class.Determining whether a model has been attacked or the category of the attack is a very challenging task, with the possibility of misjudgment.In a self-supervised learning model, certain neurons exhibit heightened activity when confronted with triggers [54], leading to the occurrence of backdoor attacks.If we can adjust the weights of these neurons,there is a significant possibility of removing the backdoor.Therefore, we propose the defense method of Anti-backdoor Through Active Attack(ATA) ,which first use custom triggers to attack all classes to implant backdoor in the model.Then, conduct desensitization training to reduce the model's sensitivity to the custom triggers.Through this process, we make the custom trigger be ineffective along with the model's weights effectively adjusted .Since the effectiveness of the triggers used by the attacker depends on specific weights of the model, the aforementioned adjustments have a high possibility of disrupting the weights associated with the triggers used by the attacker.By training multiple models using this method, we obtain a set of models with diversity in weights.By integrating the set of models into an ensemble model, we can effectively defend against backdoor attacks using the DHR architecture.
Experimental Setup.The experiment utilizes the CIFAR-10 dataset and employs the BadEncoder [53] attack method, known for its strong effectiveness, during the active attack stage.The testing evaluates the defensive effectiveness against various attack methods, including SSL Backdoor [55],ctrl [56] and BadEncoder [53].The experimental results are shown in Table 4.In the experiment, we employ three metrics:ACC represents the classification accuracy on clean datasets, ASR represents the success rate of backdoor attacks (the poisoned datasets classified into the specified class), and PACC represents the classification accuracy on poisoned dataset.
Results.The experimental results demonstrate that our method ATA based on DHR performs excellently across various attack methods, reducing the ASR to a nearly negligible level while maintaining the model's classification accuracy on clean samples.Additionally, the PACC is close to ACC, indicating that the integrated model of the DHR architecture is completely desensitized to triggers.Method.Ensemble methods are commonly used in the field of adversarial defense.Attackers need to deceive the majority of sub-models in the ensemble model to achieve their attack goals, where each sub-model has different network structures and weight parameters.Therefore, theoretically, ensemble models are usually more robust than single models and can defend against various adversarial attack methods.However, ensemble methods require a large amount of clean data to train the sub-models, which most backdoor eradication methods cannot provide.As shown in Figure 8, to address this gap, we propose integrating multiple backdoor mitigation strategies through ensemble distillation to strengthen backdoor elimination.We perform backdoor erm madication on the backdoor model using a small amount of clean data and simple backdoor eradication methods.Specifically, we employ fine-tuning, Re-init, and Fine-Pruning separately, obtaining three distinct and relatively clean models as teacher models for the distillation process.Finally, we use the teacher models obtained from the previous steps and the augmented data to conduct ensemble distillation on the original backdoor model.The student model learns clean knowledge from the three teacher models, with the student network's intermediate layer emulating the output of the teacher model's intermediate layer.
Results.Table 8 demonstrates the effectiveness of the six attack methods on the three datasets (ASR and CA), and evaluates the defense effects of ensemble distillation (△ASR and △CA).We observe that ensemble distillation effectively eradicates the backdoors in all backdoor models-average △ASR decreased to 24%.At the same time, the model's performance on clean samples is not significantly affected, with an average △CA remaining at around 80%.These results indicate that ensemble distillation is effective in defending against various models under different backdoor attacks.

Poisoning Defence
Method.Poisoning attack is the attack that targets graph-based machine learning models by manipulating data pre-training [67].Poisoning attack usually modifies the graph data before the model is trained with the intention of reducing the model's predictive performance on certain tasks-such as node classification or graph classification-without significantly altering the appearance of the graph data [68,69].
Poisoning attacks on graph-based models pose substantial threats to the reliability and efficacy of artificial intelligence applications.
While current defence models attempt to clean graph data from the graph structure view and node feature view [70], these models often suffer from shortcomings and limitations of model singularity.In fact, models based on graph structure and node feature view are complementary.By using graph data diversity, the limitations of a single view can be overcome.This strategy exploits the complementary nature of the two view, enabling GNNs to capture and identify potentially anomalous attacks from data diversity, thus improving the comprehensiveness and effectiveness of the defence.In addition, the strategy enhances the robustness of GNNs, and an attacker needs to bypass multiple models at the same time to successfully execute an attack, which greatly increases the difficulty of the attack.At the same time, data diversity also allows GNNs to be more flexible in adapting to new attacks, by adjusting the weights and strategies of different view to cope with changing attack patterns.
Therefore, we achieve a more effective defence based on the idea of DHR by integrating multiple graph views in order to combine the topology and attribute information of the graph.Specifically, we propose DHRGNN that integrates three heterogeneous models to defend against poisoning attacks.Firstly, we use the SVD decomposition-based model (SVD) and Edge-Boosted Attention Model (E-Boost), to clean up the perturbations and generate robust graphs from the graph structure and node feature view, respectively.Then, we use Robust information combination model (RIC) to fuse the information of the two robust graphs for reducing the impact of poisoning attacks on a single model, and improve the robustness of the model.
Experimental Setup.We compare different models on commonly used datasets in GNN field, including Cora [71], Citeseer [72].Cora comprises 2708 machine learning papers in 7 classes, noted for graph deep learning, with a 5429 link network.CiteSeer is a citation network with 3312 scientific publications in 6 categories and 4732 links.And we conduct a comparative evaluation under the Meta [73] and Nettack [74].Meta treats the graph as a hyperparameter and uses meta-gradients to perturb

Applications to Real-world Object Detection
To further verify the effectiveness and feasibility of this method, we decided to test it using commonly used object detection systems in real-world scenarios.Object detection is a deep learning task aimed at identifying and locating objects within images by marking them with bounding boxes.This technology has a wide range of applications across various fields like autonomous driving [79],remote sensing image analysis [80].
Based on the Dynamic Heterogeneous Redundancy (DHR) concept, we developed a diversity training method [81] to combine multiple heterogeneous models into an ensemble object detection model, and performed attack tests on this diversity ensemble model (DEM) based on mainstream adversarial attack methods, and compared the effects with the baseline ensemble model (BEM) without using this training method.The experimental results are shown in the Table 11 and Table 12.The experiment in Table 11 is a white box scenario, in which we perform adversarial attacks on the first sub model of two ensemble models, and then input the generated adversarial samples into the corresponding ensemble models to obtain the mean average precision (mAP) of each sub model and ensemble model.It can be seen from the data in Table 11 that the detection performance of the DEM using our diversity training is always higher than that of the BEM under the scenarios of different data sets and different attack methods.In addition, under the same attack environment, the mAP of the two unassailed sub models of DEM is always higher than that of the corresponding sub models of BEM, which shows that the diversity training method can enhance the anti transferability between the sub models in the ensemble model in the face of adversarial samples, and shows the effectiveness of the diversity training method.
The experiment in Table 12 is the anti transferability test in the black box scenario.The adversarial samples in this scenario are generated based on model attacks other than DEM and BEM, to prove whether the diversity training method is still effective in the black box scenario.It can be seen from the data in Table 12 that all models can maintain a good mAP when they are not attacked.After using the adversarial attack, the mAP of the attacked model is close to 0, and the detection ability is almost invalid.Then these adversarial samples are input into the two ensemble models to test their anti transferability respectively.Under the influence of adversarial samples, the mAP of each ensemble model generally decreases.However, the mAP of DEM is always higher than that of BEM, whether from the The data in Tables 11 and 12 show the effectiveness of DHR in ensemble object detection tasks.
In addition, in order to verify the defense effect of this model in the real world, we conducted an attack test on our ensemble model based on the adversarial patch method.This attack is aimed at the classification of "human", and the attack target is to make the object detection model fail to detect the human who has been patched, and the experimental results are as shown in the Figure 9.In the intense competition of the 6th "QiangWang" Mimic Defense International Elite Challenge, as the organizers, we applied the ensemble object detection model based on Dynamic Heterogeneous Redundancy (DHR) architecture, and fused the outputs of multiple sub-models into a final output through a specific decision-making mechanism.The model demonstrated excellent performance during the competition, effectively resisting various adversarial attacks and backdoor attacks, fully demonstrating the significant effectiveness of the DHR idea in enhancing the robustness of the AI model in practical application scenarios.As shown in Table 13, in an ensemble object detection system, attacking only the first sub model and inputting the generated adversarial samples into the ensemble model can obtain the mAP of each sub model.Through observation, it can be seen that when a single model faces white box attacks, its detection ability will significantly decrease, and the robustness of a single model is seriously insufficient.However, the transfer effect of adversarial samples generated based on the attacked sub model on other sub models is not ideal.Although it may still reduce the mAP of other sub models to a certain extent, the attack effect will be greatly reduced, and other sub models can still retain good detection capabilities.Based on this finding, during the competition, we deploy this ensemble model and only disclose one sub-model to the contestants.This strategy, under multiple layers of defense, can significantly reduce the impact of adversarial samples on the ensemble model.

Conclusion
In this paper, we analyzed the current state of attack and defense in DL and found that the status between them has evolved into a Nash equilibrium, leading to difficulties in resolving security issues.
To address this dilemma, we introduced the concept of ESS and reclassified existing security problems accordingly.We posit that this predicament arises from the challenge of resolving ESS issues.To address this challenge, we emphasize the importance of necessary diversity and propose employing DHR to tackle ESS problems.We conducted multiple diversity case studies across various deep learning applications such as image classification, object detection, natural language processing, and graph neural networks.Through simple ensemble strategies like majority voting and averaging, we validated the importance of introducing diversity, which resulted in significant defense effectiveness against adversarial attacks, backdoor attacks, poisoning attacks, and more.These experiments demonstrate the system's capability to maintain robustness even when individual models lack robustness, effectively mitigating various security threats.By introducing necessary diversity, it is possible to enable deep learning-based application systems to operate robustly even when individual models lack robustness.This work aims to provide a new viable approach to addressing AI security issues.By designing deep learning (DL) application systems with ESS properties at the system architecture level, we aim to address the challenge of ensuring robustness in individual DL models from a novel perspective.Through preliminary validation, we believe this is a viable new path.By introducing diversity, it may be possible to make deep learning applications affected by security issues genuinely applicable in real life.Diversity is indeed at the core of the DHR architecture, but the architecture also encompasses elements such as dynamics, adjudication, and negative feedback adjustment.Verification of diversity is just one step in exploring its feasibility.In subsequent research, key focuses include how to construct diverse models, how to evaluate heterogeneity among models, and how to conduct adjudication.The article introduces a new approach to addressing security issues in deep learning and provides limited validation to demonstrate its feasibility.However, further exploration is needed for solutions to AI security, especially with the rapid development of large language models(LLM) bringing forth new security challenges such as injection attacks, jailbreaking attacks, and more.The article introduces a new approach to addressing security issues in deep learning and provides limited validation to demonstrate its feasibility.However, further exploration is needed for solutions to AI security, especially with the rapid development of LLM bringing forth new security challenges such as injection attacks, jailbreaking attacks, and more.LLM are developed based on deep learning (DL), and inherently possess various security issues characteristic of DL.Whether the ESS method can be used to address the security issues of LLM requires further exploration in the future.

Page 3 of 28 F
o r p e e r -r e v i e w o n l y --S & S

Definition 2 .Figure 1 .
Figure 1.Existing classification of safety and security problems in DL.

Figure 2 .
Figure 2. New classification of safety and security problems in DL based on the ESS theroy.

F o r p e e
r -r e v i e w o n l y --S & S Author A et al.: This is Article Title

Figure 3 .
Figure 3.The essence of an ensemble model is to combine the security spaces of multiple models.[43]

Page 8 of 28 F
o r p e e r -r e v i e w o n l y --S & S Author A et al.: This is Article Title

Page 10 of 28 F
o r p e e r -r e v i e w o n l y --S & S Author A et al.: This is Article Title (a) Neural Network Training Process (b) Construction process of sub model pool

5. 1 . 2
Defending against Adversarial Attack through Weight Diversity Methods.By increasing the diversity of training gradients, we can reduce the transferability of adversarial examples.Considering the complexity of gradient computation, we decided to enhance diversity from the perspective of weights.Since gradients are closely related to the model's weights, increasing the diversity of weights can to some extent be equivalent to increasing the diversity of gradients.To promote diversified training, we defined two metrics to quantify the diversity of model weights and regulated them during training.Strengthening Weight Concentration (SWC) and Penalizing Weight Correlation (PWC) can encourage greater differentiation among submodels.The weights distribution of the same model and the same layer under different training methods is shown in Figure 7.

Figure 6 .
Figure 6.Comparison Of Attack Results Between Different Models Of Transfer Attacks;(a)LBP is the attacked mode.(b)Canny is the attacked mode(c)Mean is the attacked mode(d)Max is the attacked mode

Figure 7 .
Figure 7. Distribution of Weight values for identical layer across Ensemble Model Sub-Models

F o r p e e
r -r e v i e w o n l y --S & S Author A et al.: This is Article Title

5. 2 . 2
Defending against backdoor attacks through data diversity.Method.Backdoor erasing methods are a category of backdoor defense methods.Fine-tuning is the simplest backdoor erasing method, which erases the backdoor by training backdoor model on a small Page 15 of 28 F o r p e e r -r e v i e w o n l y --S & S

Figure 8 .
Figure 8. Flowchart of the method.(a) Use a small part of clean samples to erase backdoor on backdoor model to obtain multiple teacher models, then (b) utilize ensemble distillation through multiple teacher models to obtain a cleaner student model.

Page 17 of 28 F
o r p e e r -r e v i e w o n l y --S & S Author A et al.: This is Article Title

Page 18 of 28 F
o r p e e r -r e v i e w o n l y --S & S F o r p e e r -r e v i e w o n l y --S & S Author A et al.: This is Article Title

Page 20 of 28 F
o r p e e r -r e v i e w o n l y --S & S perspective of ensemble model or sub model, which shows that in this black box scenario, DEM model has less negative impact on the face of adversarial samples and stronger anti transferability.

Figure 9 .Figure 9
Figure 9. Illustration of proposed ensemble adversarial defense against physical adversarial attack on person detectors

Page 21 of 28 F
o r p e e r -r e v i e w o n l y --S & S r -r e v i e w o n l y --S & S Author A et al.: This is Article Title Jiayu Du is currently engaged in research on endogenous security in Purple Mountain Laboratories.He earned his master degree in Communication Engineering from Dalian University of Technology.His research fields include digital image forensics, big data storage, cloud computing, artificial intelligence, etc.He attempted to solve safety and security issues in these fields using endogenous security methods.Page 28 of 28 F o r p e e r -r e v i e w o n l y --S & S

Table 1 .
Attack Transfer Statistics for Each Model

Table 2
demonstrates the accuracy of adversarial examples generated by attacking a single model under simultaneous recognition by multiple models on the ImageNet100 dataset.

Table 2 .
Recognition accuracy (%) under grey-box attacks with control parameter (Para.)as ϵ in L∞ for BIM, PGD and SPSA.The ensemble size is K = 3.The best performance is marked in bold.

Table 4 .
Our defense method ATA on different attack methods

Table 6 .
The results of data diversification, where "train" and "test" respectively represent the number of samples in the training set and the testing set.

Table 7 .
The performance of BadNets on three datasets and the defense effect of fine-tuning with different data augmentation methods.

Table 8 .
The performance of the six attack methods on three datasets and the defense effect of ensemble distillation.BN denotes BadNets, RP denotes RIPPLe, Ins denotes InsertSent, HK denotes HiddenKiller, Style denotes StyleBkd.

Table 11 .
The MAP(%) of attacked submodels and the ensemble models trained on the different dataset and YOLOv3 model, under attacks from different methods on the first submodel.

Table 12 .
The MAP(%) of all submodels and the ensemble models when facing adversarial examples generated by an unrelated model trained on different dataset and YOLOv3 model.

Table 13 .
Comparison of transferability among sub-models within the ensemble model based on the self-made Road Traffic Sign Dataset