RADNN: R OBUST TO IMPERCEPTIBLE ADVERSARIAL ATTACKS D EEP N EURAL N ETWORK

,


Introduction
During the last years, deep neural networks have made a tremendous success as they could achieve high accuracy on different complex applications as computer vision, and natural language processing [1].However, recent findings have shown that deep learning models have several vulnerabilities to adversarial attacks.Deep learning tends to make wrongly overconfident predictions on modified data [2].Furthermore, their "black-box" nature makes extremely difficult to audit their decisions [3].
Security aspects of machine learning are extremely important specially on high stake applications as autonomous cars [4].In particular, robustness to adversarially chosen inputs is becoming a crucial design goal as recent work [5] shows that an adversary is often able to manipulate the input so that the model produces an incorrect output.
Hence, defending against such attacks has become an important research topic, and many approaches to improve model security and robustness have been proposed, including improvements to model design, training data augmentation, input preprocessing, defensive validation, among others [6].Identifying vulnerabilities and addressing them also plays a vital role in obtaining a more robust model [ [7].
In this paper, we propose a prototype-based method that is able to detect changes in the data patterns and detect imperceptible adversarial attacks on real time.Differently from traditional approaches, the proposed method does not require a specific training on adversarial data to improve its robustness.

Imperceptible Adversarial Attacks
Image adversarial attacks are focused on perturbations that cause misclassification by a deep classifier while are imperceptible to the humans [8].However, visually changes in the image are perceptible when larger image perturbations are used to improve the ability to fool a classifier.The PerC algorithm allows to hide large perturbations in the RGB space, in a way that is not noticeable to humans [9].

Robust to adversarial attacks Deep Neural Network (RADNN)
The proposed RADNN is equipped with a mechanism that allows real-time concept drift detection (data pattern detection) due its density-based nature and its design based on prototypes.The RADNN approach is described as a feedforward neural network.The training architecture that is composed by the following layers: 1. Features layer; 2. Density layer; 3. Conditional probability layer; 4. Prototype identification layer; RADNN is trained per class.Thus, it is composed of multiple structures for each class as illustrated by Fig. .

Features layer
This layer is in charge of the features vector extraction.RADNN has a flexible structure so this layer can be formed by different methods including: convolutional neural networks [10], residual neural networks as Resnet [11], Inception-Resnet [12], Transformers-based approaches [13] or even a combination of multiple sources fro feature extraction.In this paper we consider the VGG-16 [14] method as feature extractor.
The training dataset is defined as x = {x 1 , ..., x N } ∈ R n with corresponding class labels y 1 , ..., y C ∈ {1, ..., C}.Where, N is the number of training data samples and n is the dimensionality (number of features); C is the number of classes.The most representative data sample in the dataset are chosen as prototypes π ∈ P ⊂ X for each class.Where, M j denotes the total number of prototypes of class j; M j = |P j |; M = C j=1 M j .In this paper, we consider more than one prototype per class, so M j > 1 for ∀j.(1)

Density layer
This layer is defined by neurons which have data density, D as activation function.The density function defines the mutual proximity of the data samples in the data space and can be represented by the following Cauchy function [15]: where D is the density, µ is the global mean, and σ is the variance.The mutual proximity of the data samples in the data space using Euclidean (or Mahalanobis) distance has the form of a Cauchy function as demonstrated theoretically by [15].
Data density can be updated recursively by [16]: ( where i = 1, ..., N , µ and the scalar product, can be updated recursively as follows: Data density, D, denotes the degree of closeness of a data sample to the mean, µ.For values that are normalized between 0 and 1 the density ranges is 0 < D ≤ 1.Where D = 1 when x = µ.In this sense, data samples that are closer to the global mean have higher density values.The data density value indicates how strongly a particular data sample is influenced by other data samples in the data space due to their mutual proximity.Data density indicates the centrality of a data sample in the data space and its eligibility to become a prototype.

Conditional probability layer
The typicality, τ , layer or conditional probability layer is estimated from empirical data as in [15].It is given by eq. ( 6), where integral of ∞ −∞ p(C|x)dx = 1 is multi-modal version of pdf [15]: where N i denotes the number of data samples associated with the i − th data cloud, C i=1 ; N i = N .p(C|x) does not rely on any prior assumption about the data [15].

Prototypes layer
RADNN is trained per class as illustrated by Fig. 2. So, the calculations are done for each class separately.Prototypes are independent data samples that are defined as the local peaks of the data density.Thus, the prototypes set can be altered (prototypes can be added or removed) without affecting the other existing ones.
Data samples are assigned to the nearest prototype as: New prototypes are added to the set of prototypes if the following condition is met [15]:

Learning Procedure
RADNN learning mechanism is summarised by the following pseudo-code.

RADNN: Learning Procedure
1: Read the first feature vector sample x i of class c; 2: Normalise the data as detailed in [17] 3 Create new prototype: j ← j + 1; Search for the nearest prototype according to eq. ( 7);

Features layer
Similar to the feature extraction layer described in the training phase.

Local decision-making
It is responsible to calculate the degree of similarity, S, between an unlabeled data sample and the respective nearest prototype.The similarity degree between any new image and a given prototype is determined by a SoftMax-like equation ( 9). where, where Y is the j − th validation data sample.S is the degree of similarity between the unlabeled data sample and the respective prototype.
Figure 3: Architecture for attack detection and validation process of RADNN.

Global decision-making (attack detection)
The RADNN uses the recursive mean µ i of the λ to detect sudden drop on the confidence.When a new data sample arrives to the system,µ is calculated as [16]: Then the m-σ rule is applied.Possible attacks are actively detected when the inequality ( 13) is satisfied.Otherwise, if the inequality is not satisfied the new data sample is assigned to a label.
When the inequality ( 13) is satisfied, the arrival data sample is denoted as a potential attack and temporally saved.The label for high confidence data samples is given by the equation ( 14):

Experiments
In order to evaluate the robustness of RADNN to imperceptible attacks, we considered 1000 images from the Imagenet dataset attacked by the PerC algorithm.Deep Learning approaches as VGG-16 and ResNet were also used during this experiment.
In order to evaluate the approaches the following metric has been considered: Detection rate:

Results
Table 1 illustrates the different results obtained during the experiments  [18] 57.48% ResNet-152 [11] 46.23% VGG-16 [14] 38.31% AlexNet [10] 37.72% As demonstrated by 1, RADNN could obtain better performance in terms of detection than its competitors.The reason of this difference in the results is that traditional deep learnings lack of robustness because they need to be trained to the specific attack to obtain high performance.On the other hand, RADNN is equipped with a detection mechanism that allows flexibility on the structure to not classify data samples that the algorithm is not confident about.Moreover, the transparent structure of RADNN allows users to inspect the networks decisions and visualize/understand it.

Conclusion
In this paper, we introduced RADNN algorithm.The algorithm has a robust design that is able to detect imperceptible to human attacks due its density-prototype-based nature.The experiment have shown that differently from traditional approaches that need to be trained on the attacks to obtain high performance in terms of detection, the RADNN is able to detect attacks without prior training on it due its confidence system based on similarities.

[ 9 ]
proposes the PerC algorithm which creates adversarial examples by perturbing images through its perceptual color distance.PerC makes possible to use larger L p norm values of perturbations in RGB space to create adversarial examples that are less perceptible to humans.Fig 1 illustrates adversarial examples caused by PerC.

Figure 2 :
Figure 2: Training architecture for the Imagenet dog class.

11 :
Fig.3illustrates the architecture for the adversarial attack detection and validation of the RADNN method.3.This architecture is composed by the following layers:

Table 1 :
Results considering different methods for imperceptible attacks identification