EITGAN: A Transformation-based Network for recovering adversarial examples

: Adversarial examples have been shown to easily mislead neural networks, and many strategies have been proposed to defend them. To address the problem that most transformation-based defense strategies will degrade the accuracy of clean images, we proposed an Enhanced Image Transformation Generative Adversarial Network (EITGAN). Positive perturbations were employed in the EITGAN to counteract adversarial e ff ects while enhancing the classified performance of the samples. We also used the image super-resolution method to mitigate the e ff ect of adversarial perturbations. The proposed method does not require modification or retraining of the classifier. Extensive experiments demonstrated that the enhanced samples generated by the EITGAN e ff ectively defended against adversarial attacks without compromising human visual recognition, and their classification performance was superior to that of clean images.


Introduction
Convolutional Neural Networks (CNNs) have been successfully applied to a wide range of computer vision tasks, including image clasification [1], object detection [2], and semantic segmentation [3].However, recent studies have shown that CNNs can be deceived by images with meticulously designed small perturbations that are imperceptible to human vision [4,5].These perturbations that cause the CNN to misclassify an image into a different class are called adversarial perturbations, and the perturbed images are called adversarial examples.Adversarial examples pose a significant security threat to the further adoption of advanced computer vision systems.Thus, addressing the influence of adversarial examples remains a challenging problem.
Many research have proposed strategies to defend against adversarial attacks [6,7].At present, adversarial defenses are mainly divided into two categories: Partial defense, which conducts early defense by only detecting adversarial examples [8], and complete defense.Complete defense is divided into model-specific and model-agnostic, resisting adversarial attacks by modifying classifiers or adversarial examples.Model-specific methods aim to regularize a specific model's parameters through adversarial training or parameter smoothing [9].Such methods often require differentiable transformations, which not only consume high computation, but are also vulnerable to further attacks.Model-agnostic methods aim to remove adversarial perturbations from the input image domain by applying various transformations.These methods include Joint Photographic Experts Group (JPEG) compression [10], random image resizing & padding [11], total variance minimization [12], random Pixel Deflection (PD) [13], and image Super-Resolution (SR) [14].Compared to model-specific methods, model-agnostic methods are simpler, faster, and more favorable.However, most model-agnostic methods lose part of the image information while removing adversarial perturbations, which decreases their classification performance on clean images.
In this study, we propose the Enhanced Image Transformation Generative Adversarial Network (EITGAN), an improved model-agnostic defense mechanism designed to generate enhanced samples that demonstrate superior classification performance compared to the original clean images.The defensive efficacy of the proposed EITGAN is not limited to image classification models.It can also be employed for adversarial defense and performance enhancement in other domains of image processing, such as medical image analysis models [15].For example, Optical Coherence Tomography (OCT) images are frequently affected by noise and speckle [16,17].These types of noise are similar to adversarial noise, and EITGAN can help improve the performance of convolutional neural networks used in OCT image analysis.1 illustrates an example of an enhanced sample on an adversarial example.The proposed EITGAN is used to generate enhanced samples with positive perturbations that can transform the labels of adversarial examples to the correct class.The generator of EITGAN consists of a super-resolution network and a noise network.The super-resolution network is used to generate super-resolution images to mitigate the effect of adversarial perturbations, and the noise network is used to generate positive perturbations to offset the influence of adversarial perturbations.We prove that the adversarial examples exhibit better performance than clean images after adding positive perturbations.Our main contributions are summarized as follows: • A positive perturbation is employed to guide the attention of classifiers toward distinctive regions in the images that correspond to the correct class labels, rather than background regions.• We propose the EITGAN to generate enhanced samples with positive perturbations, which can effectively resist adversarial attacks without affecting human vision recognition, and enhance the classification performance on transformed images higher than that of clean images.• The proposed method is a model-agnostic defense that does not require modification or retraining of the target classifier.This can easily surpass other model-specific defenses.
The remainder of this paper is organized as follows.In Section 2, we present the related adversarial attacks and defense works that will be used in our work.In Section 3, the proposed EITGAN is introduced in detail, and the experimental results and analysis are presented in Section 4. Finally, the conclusion of this paper is provided in Section 5.

Related works
Here, we introduce several well-known adversarial attacks and defenses proposed in the literature that form the basis for our experiments.We only study model-agnostic defenses against non-targeted adversarial examples for image classification, although the same can be applied to other computer vision tasks.

Adversarial attack
Given a target classifier F(•) and a clean image x, y represents the ground-truth label corresponding to x. Untargeted attacks imply modifying a sample that was initially correctly classified, causing it to be randomly and inaccurately assigned to any erroneous category.For these attacks, the given adversarial perturbation ρ will make the adversarial example x = x + ρ look the same as the clean image, but the corresponding label F( x) y is incorrect.Targeted attacks are similar, but they require to change the correct label into a specific incorrect label y t .They seek x such that F( x) = y t and y t y.
Next, we present a brief overview of several popular attacks against which we will evaluate our method.
Fast Gradient Sign Method (FGSM) [18] is a single-step attack that uses the sign of the gradient of the loss function ℓ w.r.t. the image to find the adversarial perturbation.For a given step size ϵ, FGSM is defined as Eq (2.1): x Projected Gradient Descent (PGD) [4] is a variant of Basic Iterative Method (BIM [19]: an iterative version of FGSM) with uniform random noise as initialization, which is one of the most powerful first-order attacks.PGD projects the adversarial examples learned from each iteration into the L p neighbor of clean images to constrain adversarial perturbation.The procedure of each iteration is as Eq (2.2): where Π x+S represents projecting the updated adversarial examples into the range from x − S to x + S .DeepFool (DF) [20] is an untargeted iterative attack that aims to minimize the L 2 norm between clean images and adversarial examples.This method approximates the classifier to a linear decision boundary, and then looks for the smallest perturbation until the image crosses the boundary and is misclassified.The resulting perturbation is difficult for humans to detect.
Carlini and Wagner (C&W) [21] is an optimization-based attack that combines a differentiable surrogate of the model with a relaxation term to solve the perturbation minimization problem.The optimization is expressed as Eq (2.3): where Z( x) y t denotes the logit value (the output before the softmax layer) corresponding to class y t , and k is the margin parameter.

Adversarial defense
Given a target classifier F(•) and an image x, which may be a clean image x or adversarial example x.Adversarial defense is a method that aims to make the prediction F( x) on image x equal to the one F(x) on clean image x.The model-specific defense mentioned earlier modifies the classifier as F ′ (•) such that F ′ ( x) = F(x), and the model-agnostic defense uses transformation G(•) to change the image such that F(G( x)) = F(x).In this study, we focus on the study of the model-agnostic defense.
Recently, adversarial defenses against the input image transformation domain have been proposed.Luo et al. [22] proposed a foveation-based mechanism, which crops the image around the object with ground-truth coordinate data, then scales it back to the original size.The JPEG compression defense [10] removes adversarial perturbations by compressing high-frequency noise information that is invisible to the human eye.However, this method is effective only for very small perturbations.Guo et al. [12] proposed image transformation using quilting and Total Variance Minimization (TVM).Image quilting refers to replacing input image patches with similar patches drawn from a bank of images.However, image quilting alone is often insufficient.Therefore, it is combined with TVM, which minimizes the total variance by optimizing the construction of substitute images.Xie et al. [11] performed image transformation by randomly resizing and padding the images.The randomness property is also used in the work of Prakash et al. [13], which uses a wavelet-based pixel deflection transform to denoise the perturbation.Based on this, Mustafa et al. [14] proposed a method that combines the wavelet denoising and image super-resolution against adversarial attacks.
The main shortcoming facing most model-agnostic defenses is that the transformation degrades the quality of clean images, which leads to the loss of important information and decreases the classified performance.

EITGAN
This section presents the specific implementation details of the proposed enhanced model-agnostic defense against adversarial attacks.In Subsection 3.1, we introduce the overall architecture of the EITGAN and explain the implementation process in detail.In Subsection 3.2, we describe the design of the generator, which is the core component of EITGAN.In Subsection 3.3, we briefly introduce the design of the discriminator, which is also a part of EITGAN.Finally, we summarize the overall algorithm design of the EITGAN in Subsection 3.4.

Overview of EITGAN
In this work, EITGAN is used to transform adversarial examples and enhance them, and the same can also be applied to clean images.The proposed method can effectively resist adversarial attacks and improve the accuracy of the clean images.Figure 2 illustrates the overall architecture of EITGAN, which is mainly composed of the generator (G), the discriminator (D), and the target classifier (F).It should be noted that the target classifier is independent of EITGAN; therefore, the training process does not involve modifying the parameters of the target model.The classifier utilized in EITGAN is primarily employed to predict the label of the generated sample.As shown in Figure 2, the input of the model is the adversarial example X adv and the output is the corresponding enhanced sample X es .The generator of the EITGAN consists of two parts, the image SR network and a noise network (Noise).The SR network draws on the idea of Mustafa et al. [14], which uses image SR technology to generate recovered adversarial examples X sr .However, the difference is that we do not just train an SR network, but also use it as a part of the generator and train it together with the noise network.The image generated by the S R network is not a common SR image, but an image of the same size as the input image.Keeping the output image size unchanged enables the image to better merge with later images while saving training resources and time.The Noise network is used to generate positive perturbations X pert , which can offset the influence of adversarial perturbations and enhance the performance of the classifier.Finally, the enhanced sample is generated by adding the SR image X sr with the positive perturbation X pert in a specific ratio.
The generation of enhanced samples can be expressed as Eq (3.1): where α and β represent the ratios of SR images and positive perturbations, respectively.In this work, we set α = 0.6 and β = 0.4, which are empirical values.conv in Eq (3.1) represents a 1 × 1 convolution layer.Because the features after simple addition exhibit relative independence, a 1 × 1 convolution is employed to facilitate interaction between channels, enhancing the fitting of these features.For a given clean image X and its corresponding ground-truth label Y, when the input is an adversarial example, the sample enhanced by EITGAN can enable the target classifier to classify it into the correct category (i.e., F(G(X adv )) = Y and F(X adv ) Y).When the input is a misclassified clean image, EITGAN can also make it be classified correctly (i.e., F(G(X)) = Y and F(X) Y).The optimization loss L GAN , L x , and L y , as shown in Figure 2, will be introduced in Subsection 3.4.

Generator of EITGAN
As the core component of the EITGAN, the generator mainly consists of the S R and Noise networks.Hence, we will provide a detailed introduction to the specific structures and parameters of these two networks.

SR network
Inspired by enhanced deep SR (EDSR) network [23], we design the SR network as shown in Figure 3.The SR network has twenty layers.The first layer is a down-sampling layer, the middle seventeen layers are composed of eight residual blocks and one convolutional layer, and the last two layers are up-sampling layers.In Figure 3, the solid line represents the implementation process of the network, and the dashed line represents the specific structure of the corresponding module.The three sets of data in the convolution brackets represent the input channel, output channel, and kernel size, respectively.Here, the kernel size of all convolution layers is 3 × 3. Figure 3 shows that the input samples are first passed through the down-sampling layer, which halves the sample size from 224 × 224 to 112 × 112.We use the method of down-sampling and then up-sampling to avoid the convolution of high-resolution images, thereby reducing the computational complexity.Subsequently, they are inputted into eight residual blocks and one convolutional layer, which keeps the sample size and channels unchanged.The output is then added to the output of the previous down-sampling and inputted into the up-sampling layer.We use pixelshuffle as the up-sampling, which is adopted from the Efficient Sub-Pixel Convolutional Neural Network (ESPCN) [24].The pixelshuffle is mainly composed of a convolution layer and a shuffle operation.Assuming that the channel of the input sample is C, the size is H × W, and the upsampling factor is r, the convolution operation reshapes the sample as (r 2 C, H, W), and the shuffle operation further reshapes the sample as (C, rH, rH).In this work, we set the upsampling factor r = 2, doubling the size of the sample and resulting in a sample size of 224 × 224.Finally we pass the output of pixelshuffle into the deconvolution layer to reduce the number of channels from 64 to 3, thereby obtaining an SR image with the same size as the input sample.

Noise network
The structure of the Noise network is shown in Figure 4, which is divided into three parts.The top, middle, and bottom parts represent the encoder, residual block, and decoder, respectively.The Noise network has sixteen layers.Both the encoder and decoder consist of four layers, while the middle section incorporates four residual blocks, each comprising two layers.The input of the noise network is the output of the S R network, which has the same shape as the original sample (224 × 224 × 3).As shown in Figure 4, the input samples are encoded first.The encoder is composed of four convolutions, each followed by Instance Normalization (IN) and Rectified Linear Unit (ReLU), and the convolution kernel size is 3 × 3.After the first and third convolution layers, the sample size was reduced by half.The second layer reduces the sample size by two pixels, and the fourth layer keeps the size unchanged.During encoding, the sample channels increase exponentially, resulting in the encoded samples sizing at 55 × 55 × 256.Subsequently, the encoded samples are inputted into four residual blocks for residual convolution, and each group of residual blocks maintains the sample size and channels unchanged.It is worth mentioning that the residual block differs from that in the S R network.Batch Normalization (BN) is used after each convolution layer to accelerate the network convergence.The output of the residual convolution is then inputted into the decoder for decoding.The decoder's structure mirrors that of the encoder, each comprising four convolutional layers.However, the decoder employs deconvolution, with the final deconvolution layer being succeeded by the hyperbolic tangent function (Tanh).The kernel sizes of the first and third layers of the decoder are both 4 × 4, and those of the second and fourth layers are both 3 × 3.After the first and third deconvolution layers, the sample size is doubled, the second layer increases the sample size by two pixels, and the fourth layer keeps the sample size unchanged.During decoding, the sample's channels decrease exponentially, resulting in positive perturbations that have the same shape as the input samples of 224 × 224 × 3.

Generation of enhanced samples
It is not sufficient to only use SR images generated by the S R network or positive perturbations generated by the Noise network.Therefore, the positive perturbations are added to the SR images.The positive perturbations can neutralize adversarial perturbations in SR images and further enhance their classification performance.In the EITGAN, SR images focus on improving the visual quality, and positive perturbations focus on improving the classification performance.To avoid affecting the visual quality of the enhanced samples after addition, the ratio of the SR image is set to 0.6, and the ratio of the positive perturbation is set to 0.4.However, the features after simple addition are relatively independent, so a layer of 1 × 1 convolution is used to make the features interact between channels, thereby enhancing the expression of features and obtaining the final enhanced samples.

Discriminator of EITGAN
As a part of EITGAN, the main purpose of the discriminator is to conduct adversarial learning with the generator, and to identify as accurately as possible whether or not the input sample is the original sample.We refer to the encoder in the Noise network to design the discriminator, which has five layers, including four convolutional layers and one fully connected layer, as shown in Figure 5.The upper part of Figure 5 shows the output results of each convolutional layer, and the lower part shows the specific structure of the convolutional network.
The inputs of the discriminator are the original samples and the corresponding enhanced samples.As shown in Figure 5, after each convolutional layer, the sample size is reduced twice.Except for the first layer, BN and LeakyReLU with a negative slope of 0.2 are performed after each convolution layer.Finally, the output of the convolution is inputted into the fully connected layer to obtain the value of one neuron, and sigmoid activation is performed to constrain the value between 0 and 1.In the discriminator, an output value of 0 is judged as fake, and the value of 1 is judged as real.

Algorithm description
An algorithm description of the proposed enhanced defense scheme is provided in Algorithm 1.As depicted in Algorithm 1, the implementation of EITGAN is mainly divided into two parts: The generation process and the training process.The generation process has been introduced in Subsection 3.2, so we only discuss the training process here.During the training process, when the number of training epochs was set to 20, the results were better and more stable.Therefore, in order to save training time, the training epoch is set to 20.
The overall optimization function for training the EITGAN is shown in Eq (3.2): where L GAN represents the GAN loss, which comprises of L(D) (line 9) and L(G) (line 11) in Algorithm 1. L x (line 12) and L y (line 13) represent pixel loss and category loss, respectively.The λ and µ in Eq (3.2) denotes the proportional coefficients corresponding to L x and L y .In this work, in order to enhance the classified performance of samples, we set λ = 0.01 and µ = 100.The optimization of L GAN can be described as Eq (3.3):The GAN loss L GAN makes D and G play the minimax game to ensure that the generator can generate samples as realistically as possible, so that the discriminator cannot identify whether it is the original sample.The pixel loss L x uses the L 2 norm to minimize the distance between the original sample and the generated sample, so that the generated sample looks the same as the original sample.The category loss L y uses the cross-entropy loss l CE to minimize the distance between the target classifier's predicted label of the generated sample and the ground-truth label, so that the predicted label can be closer to the ground-truth label.Undergoing 20 epochs training, we get the final generator that can generate the enhanced samples to be classified correctly.generate the super-resolution image of the adversarial example: generate the positive perturbation of the super-resolution image: X pert = Noise(X sr ) 6: generate enhanced samples with X sr and X pert : X es = conv(αX sr + βX pert ) = G(X adv ) 7: // Training process 8: update the discriminator D with parameters θ d : 9: update the generator G with parameters θ g : 11: 12: minL y = ∇ θ g E X adv [l CE (F(G(X adv )), Y)] 14: end for 15: return X es

Experiments and analysis
In this section, we prove the existence of positive perturbations through experiments and verify the feasibility and effectiveness of the proposed method.In Subsection 4.1, the experimental setup is introduced in details.The experiment in Subsection 4.2 evaluates the performance of EITGAN on different attacks and classifiers.The experiment in Subsection 4.3 compares EITGAN with the state-ofthe-art model-agnostic defense methods.The experiment in Subsection 4.4 evaluates the generalization performance of the EITGAN across different attacks and classifiers.Subsection 4.5 evaluates the performance of EITGAN on ImageNet-A.

Experimental setup
Datasets: Our experiments are performed on a dataset with 30,000 training images and 20,000 test images.This dataset is randomly chosen from ImageNet [25] and corresponds to 5 classes.Each class contains 10,000 images, and all these images have a size of 224 × 224 × 3. We choose 5 classes to reduce training costs and ensure high accuracy.
Networks: To evaluate the proposed EITGAN, we use Inception-V3 [26], ResNet-50 [27] and Inception ResNet-V2 [28] as the target classifiers.These classifiers are trained on our dataset and their parameters are saved for training EITGAN.To better evaluate the EITGAN, we set the classifiers' accuracy on the test images to 80%.While training the EITGAN, we refrain from conducting any re-training or fine-tuning on these classifiers.
Attacks: We use four adversarial attacks to generate adversarial examples, including FGSM [18], PGD [29], DF [4], and C & W [20].For FGSM, we set the step size as ϵ = 2. PGD divides a singlestep attack into multiple small-step attacks, where we set the small step size as α = 2, the iteration as t = 20, and the maximum step size is restricted to 8. DF is a non-parametric attack that optimizes the amount of perturbation to misclassify an image.For C & W, we set the margin parameter as k = 0.All adversarial examples were generated for undefended classifiers.
Defenses: We compare the proposed EITGAN with a number of recently introduced state-of-theart model-agnostic defense methods.These include JPEG compression [10], random image resizing & padding (Resize & Pad) [11], TVM [12], random PD [13], and image SR [14].All experiments run on the same dataset and against the same attacks for a fair comparison.
Metrics: In the experiments, well-known metrics such as accuracy, recall, and precision [30,31] are used to evaluate the performance of the proposed model.Accuracy is used to represent the proportion of correct samples among all samples, as shown in Eq (4.1).Recall is the proportion of correctly classified samples in the positive class and can be calculated using Eq (4.2).Precision represents the proportion of correct in the predicted positive class and is obtained using Eq (4.3).

Accuracy =
T P + T N T P + FP + T N + FN In Eqs (4.1)-(4.3),T P, FP, T N and FN represent true positive, false positive, true negative and false negative, respectively.In this work, T P refers to the samples that are correctly recognized as the current class.The sum of T P and FP represents all samples recognized as the current class.The sum of T P and FN represents all the samples of the current class.
Environment: The hardware environment used in the experiments of this paper includes an Nvidia 2080Ti Graphic Processing Unit (GPU), a Ryzen 3600X Central Processing Unit (CPU), and 32 GigaByte (GB) Dual Data Rate 4 (DDR4) memory.The software environment comprises Windows 10, Python 3.8, and PyTorch 1.7.

Results and analysis
Table 1 shows the performance of the enhanced samples generated by the EITGAN on different classifiers and adversarial attacks.The first column in Table 1 lists the target classifiers.The second column shows different adversarial attacks, among which the 'None' indicates that no adversarial attack is performed.The third and fourth columns show the accuracy of the original sample and the corresponding enhanced sample, respectively.The fifth column indicates the improved accuracy of the enhanced sample.The last two columns represent the PSNR values of the enhanced sample X es and original sample (clean image X or adversarial example X adv ), respectively.
As shown in Table 1, the enhanced accuracy for the adversarial examples generated by FGSM is the highest across the three classifiers, which are 97.5%,93.4%, and 94.1%, respectively.The improved accuracy of the adversarial examples generated by PGD is better.The last two columns show that although the PSNR of the enhanced sample is lower than that of the original sample, it generally meets the image evaluation standard.It should be noted that all PSNRs are compared to the original clean images.As shown in Table 1, EITGAN can improve not only the accuracy of adversarial examples as well as the accuracy of clean images.It should be emphasized that the accuracy of the enhanced recovered adversarial examples is higher than that of the original clean images.Compared with the results of directly processing clean images, the results of processing the corresponding adversarial examples are better.This indicates that the positive perturbation generated after merging with the adversarial perturbation can better highlight the characteristics of the target discriminative regions.To better analyze the performance of the EITGAN, we evaluate the precision and recall rates of the enhanced samples in Tables 2 and 3, respectively.The structure of Table 2 is the same as that of Table 3.The first three columns represent classifiers, adversarial attack methods, and defense methods, respectively.'None' indicates that no processing is performed.'Class 0' to 'Class 4' respectively represent the labels corresponding to the five classes.We use Class N (N: 0-4) to represent the column of 'Class 0' to 'Class 4'.The data in Table 2 indicates the probability that all samples predicted to be class N are actually class N, and the calculation is described in Eq (4.3).The data in Table 3 indicates the probability that all samples that are actually in class N are predicted to be in class N, and the calculation is shown in Eq (4.2).The data in Tables 2 and 3  Table 2 shows that the distribution of the enhanced sample is relatively balanced across the five classes, thus explaining that the high accuracy is not because one of the classes is over-recognized and proving that the experiments are of practical significance.The last column of Table 2 shows that the overall accuracy of enhanced samples generated by EITGAN is higher than that of the original clean images, which corresponds to the data in Table 1.In general, the precision of Inception-V3 for class 0 samples is higher, ResNet-50 has better precision for class 1 samples and the precision of Inception ResNet-V2 for the five classes is relatively average.As shown in Table 3, since the recall rate for the class 1 sample of the original clean images is the lowest, the recall rate of the class 1 sample generated by EITGAN is relatively low.Besides, the overall recall rate of the enhanced samples is higher than that of the original samples.In addition, since the number of T P in Table 3 is the same as in Table 2, the overall accuracy of both is equal.Table 3 is similar to Table 2, both of which verify the validity of our experiments and reflect the feasibility and high performance of the EITGAN.It can be concluded from Tables 2 and 3 that EITGAN can not only effectively resist adversarial attacks but also further enhance the classified performance of samples, which has considerable practical significance.To further diversify the experimental scenarios, we also test EITGAN with Adversarial Transformation Network (ATN) [32] and Adversarial Example Generative Adversarial Network (AdvGAN) [33].Table 4 presents the obtained test accuracy.In the table, "None" denotes the direct use of adversarial samples, while "EITGAN" indicates the use of samples generated by EITGAN.The experimental data demonstrates that EITGAN continues to exhibit significant defense capabilities against GAN-based adversarial attack methods (ATN and AdvGAN).The Class Activation Map (CAM) [34] is a weakly supervised localization technology that helps explain the prediction of the CNN model by providing visualization of the discriminative area in the image.The CAM needs to replace the last fully connected layer with a Global Average Pooling (GAP) layer, which modifies and retrains the most existing model structure.To reduce training costs, we choose to use Grad-CAM [35], an improved version of the CAM.Grad-CAM calculates the weight through the global average of the gradient and then sums the feature maps with weights.Subsequently, the ReLU is used to consider only the pixels that have a positive impact on the target class.Finally, the corrected sample is up-sampled to the size of the original image and superimposed on it to obtain the required heat maps.The heat maps range from blue to red, with red indicating a higher level of attention and a more significant impact on the results.Figure 6 shows the Grad-CAM of the prediction by Inception-V3 on the clean, attacked, and recovered images.The clean images in the upper half of Figure 6 are classified correctly, and those in the lower half are classified incorrectly.The CAM in Figure 6 indicates Grad-CAM.As shown in Figure 6, column 5 shows the perturbations added to the clean images by the PGD attack, which are called adversarial perturbations.Column 6 shows the perturbations added to the adversarial examples by EITGAN, which we refer to as positive perturbations.Column 7 presents the result of combining positive perturbations with adversarial perturbations.Comparing the combined perturbation with the positive perturbation, its distinctive regions are more prominent, and the random noise is lower.It can be observed from Figure 6 that EITGAN can recover the CAM on the adversarial examples to be consistent with that on the clean images, when the clean images are classified correctly.In the lower half, when the clean images are classified incorrectly, the EITGAN can also recover the CAM on the adversarial examples to the discriminative regions corresponding to the correct class labels.It can be concluded that positive perturbations can effectively offset the impact of adversarial perturbations and make the CAM on the recovered images consistent with that of the target discriminative regions.The enhanced samples selectively add positive perturbations, which effectively neutralize the adversarial perturbation and eventually help in recovering the model attention toward discriminative regions corresponding to the correct class labels.We compare EITGAN with various defense mechanisms, as shown in Table 5.The first column shows the target classifier, and the second column shows the accuracy of the clean images.The third to sixth columns represent the accuracy of the adversarial examples generated by the four adversarial attack methods.'No Defense' in Table 5 shows the performance of classifiers on original clean images and adversarial examples.

Comparison and analysis
In Table 5, we compare the five defense mechanisms.It can be observed that these contrast defenses have a certain recovery effect on adversarial examples, and the recovered results for C & W are relatively good compared with those of the other three adversarial attacks.However, the accuracy of clean images decreases after these defenses, and the accuracy of the recovered samples still has a certain gap compared to that of clean images.In comparison, the proposed EITGAN can improve not only the accuracy of adversarial examples but also clean images.It should be emphasized that the accuracy of the enhanced samples exceeds that of the original clean images, which represents a great advantages compared with other model-agnostic defenses.Table 5 shows that EITGAN has better recovery results for adversarial examples generated by FGSM, and the overall recovery performance of the Inception-V3 model is better.Table 5 shows that EITGAN outperforms the contrast defense methods.In order to further verify the superiority of the proposed method, we compare the recovered images generated by the contrast defenses against the FGSM attack under the Inception-V3 model, as shown in Figure 7.It can be seen from Figure 7 that the sample after 'Resize & Pad' has black borders, the sample after 'TVM' is relatively blurry, and the sample after 'PD' has apparent noise.The samples generated by EITGAN and the other defense methods (JPEG compression, image SR) are similar to the clean images.It can be proven that the proposed EITGAN does not affect the visual quality of the sample while enhancing the accuracy of the recovered samples.

Evaluation across attacks and classifiers
To better reflect the performance of the EITGAN, we separately evaluate its generalization ability across different attacks and classifiers.We evaluate the generalization performance of EITGAN across different classifiers as shown in Table 6.The first column represents the target classifiers.The first row demonstrates the target classifiers for the generalization experiments, and the second row represents the clean images and adversarial examples corresponding to the current classifier.The second column shows the state of the current sample.The row of ' ' indicates the accuracy of the original sample, and the row of ' ' indicates the accuracy of the enhanced sample generated by EITGAN.It can be seen from Table 6 that the generalization ability of the original adversarial examples between models is relatively poor, which directly affects the generalization of the proposed method between models.However, the application of EITGAN does not have a major impact on the original sample, and some results are better than the original.For example, the result of Inception-ResNet-V2 on the enhanced sample, which is generated by EITGAN trained on ResNet50 under PGD attack, is 1.7% higher than that of the original sample.
Table 7 shows the generalization performance of the EITGAN across different attacks.The first column represents different training sets used to train the EITGAN.The first row represents different test sets, where adversarial examples are generated by FGSM, PGD, DF, and C & W. We evaluate the EITGAN performance on three target classifiers.The results in each row of Table 7 indicate the accuracy of enhanced samples generated on different test sets by EITGAN, and EITGAN is trained on the training set where the row is located.The row of 'None' represents the result without the EITGAN.As shown in Table 7, the overall generalization performance of EITGAN is better, and each result across attacks is higher than the accuracy of the original clean image.The EITGAN trained sets all have the best generalization performance on the FGSM test set, and the EITGAN trained on the FGSM dataset also has the better generalization performance on different test sets.The overall experiments show that the proposed method not only resists adversarial attacks, but also generalizes well.

Application on more classes
We also do experiments on ImageNet-A [36], a dataset of natural adversarial examples that fool current ImageNet classifiers.We use the well-trained Inception-V3 [26], ResNet-50 [27] and Wide ResNet-50 [37] as the target classifiers.These classifiers are trained using ImageNet-1000.Table 8 compares the performance of EITGAN with other defense methods on ImageNet-A.As shown in Table 8, the first column lists different defense methods, where 'None' means no defense.The first row lists three well-trained target classifiers.The data in Table 8 is the recognition accuracy of classifiers on samples processed by different defense methods.It can be seen from Table 8 that most of the five defenses compared cannot improve the accuracy of the samples.Even if they could, the improvement effect is limited, and the maximum is approximately 0.1%.However, EITGAN can improve the accuracy of the three target classifiers, and the highest can be improved by 2.3%.This shows that EITGAN can also be applied to more classes and has better performance than other modelagnostic defenses.

Conclusions
Motivated by problems such as CNN being vulnerable to adversarial attacks and most model-agnostic defenses decreasing the accuracy of clean images, we proposed an enhanced defense mechanism EITGAN.The EITGAN is also a model-agnostic defense, which does not need to modify or retrain the target classifier.In this work, we used image SR to mitigate the effect of adversarial perturbations, as well as positive perturbation to further enhance the classified performance of the recovered sample.Extensive experiments showed that the proposed EITGAN outperformed the state-of-the-art defenses, which cannot only improve the accuracy of clean images but also the accuracy of recovered adversarial examples higher than that of the original clean images, greatly improving the defense performance of the target classifier.The enhanced recovered images generated by EITGAN also have good visual quality and generalization performance.
However, as the number of classes in the target classifier increases, the effectiveness of EITGAN in defending against adversarial samples gradually diminishes.The classification accuracy on a thousandclass task only improved by 2.3%.This performance was relatively less effective compared to the successful defense in a five-class scenario.In future work, we will explore the impact of the number of classes on defense effectiveness.We will also try to further improve the performance of EITGAN on more classes and further explore the application of positive perturbation to clean images.

Use of AI tools declaration
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

Figure 1 .
Figure 1.Example of an enhanced sample on the adversarial example.

Figure
Figure 1 illustrates an example of an enhanced sample on an adversarial example.The proposed EITGAN is used to generate enhanced samples with positive perturbations that can transform the labels of adversarial examples to the correct class.The generator of EITGAN consists of a super-resolution network and a noise network.The super-resolution network is used to generate super-resolution images to mitigate the effect of adversarial perturbations, and the noise network is used to generate positive perturbations to offset the influence of adversarial perturbations.We prove that the adversarial examples exhibit better performance than clean images after adding positive perturbations.Our main contributions are summarized as follows:

Figure 3 .
Figure 3. Architecture of the S R network.

Figure 4 .
Figure 4. Architecture of the noise network.

Algorithm 1
Implementation of EITGAN Input: The adversarial examples X adv Output: The enhanced samples X es 1: Given clean image X and ground-truth label Y; 2: for number of training epochs do 3: // Generation process 4: are calculated based on the EITGAN obtained from the 20th training epoch.The last column shows the overall accuracy of the corresponding data in each row.Bold font indicates the data of clean images that have not been attacked or defended.

Figure 6 .
Figure 6.Visualization of EITGAN against PGD attack on clean images that are classified correctly and incorrectly.The first column shows the clean images.The subsequent three columns show the CAM on clean, PGD-attacked, and recovered images.The fifth column shows the perturbations (magnified by 20x) added to the clean image by PGD and the sixth column shows the perturbations (magnified by 5x) added to the adversarial example by EITGAN.The last column shows the difference between the clean and recovered images (magnified by 5x).

Figure 7 .
Figure 7.Comparison of recovered images on different defenses against FGSM attack.The first column shows five clean images corresponding to the five classes.The next five columns display the recovered samples of the five contrasting approaches.The last column show the enhanced samples generated by EITGAN.

Table 1 .
The performance of enhanced samples generated by EITGAN on different classifiers and adversarial attack methods.

Table 2 .
The Precision (%) of enhanced samples generated by EITGAN.

Table 3 .
The Recall (%) of enhanced samples generated by EITGAN.

Table 4 .
Test accuracy on ATN and AdvGAN.

Table 5 .
The accuracy (%) of different defense method.

Table 8 .
The accuracy (%) of different defenses on ImageNet-A.