Adversarial Examples Generation Algorithm through DCGAN

In recent years, due to the popularization of deep learning technology, more and more attention has been paid to the security of deep neural networks. A wide variety of machine learning algorithms can attack neural networks and make its classification and judgement of target samples wrong. However, the previous attack algorithms are based on the calculation of the corresponding model to generate unique adversarial examples, and cannot extract attack features and generate corresponding samples in batches. In this paper, Generative Adversarial Networks (GAN) is used to learn the distribution of adversarial examples generated by FGSM and establish a generation model, thus generating corresponding adversarial examples in batches. The experiment shows that using the Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks (DCGAN) to extract and learn the attack characteristics from the FGSM algorithm, the generated adversarial examples attacked the original model with a success rate of 89.1%. For the model attack with increased protection, the success rate increased by 30.3%. This suggests that the adversarial examples generated by GAN are more effective and aggressive. This paper proposes a new approach to generate adversarial examples.


Introduction
With the development of information technology rapidly, deep learning is gradually being recognized and accepted. In many domains, deep learning can complete preset tasks outerform traditional methods, achieving even human-competitive results. However, while deep learning technology is widely used, the importance of its security cannot be overlooked.
According to research, existing deep neural networks are fragile and vulnerable to attacks. Szegedy et al. [1] first found that in the field of image recognition, only a very slight modification on the image will cause the classifier to misclassify the instances. In fact, these modifications make deep neural networks (DNNs) vulnerable to attacks. As the attack algorithms become more and more advanced, the adversarial example becomes more and more aggressive and produces more and more damage. As shown in Fig. 1: The picture at the bottom is generated by a certain algorithm from the picture at the top, and the neural network will classify it incorrectly. This is very dangerous since deep learning has widely used in securitysensitive fields such as unmanned vehicles, face recognition, bank identity recognition, etc. Therefore, it's necessary to research on what mentioned above. This research direction is called Adversarial Machine Learning (AML), which means that the target model will misclassify the data under the intentional design of the attacker.

In recent years, a large number of papers about adversarial examples and GAN have been published.
In the aspect of adversarial examples, since 2014, Szegedy et al. [1] proposed the concept of adversarial examples and proposed the L-BFGS algorithm for the first time. He pointed out that deep neural network is highly expressive model but this high expressive force also leads to its counter-intuitive properties: the inputoutput mapping learned by the deep neural network is discontinuous to a large extent, so the network can misclassify the image by applying some imperceptible perturbations which is found by maximizing the prediction error of the network. Goodfellow et al. [2] proposed FGSM, a fast and efficient attack algorithm, as well as a defense idea, which can be summarized as follows. Let's say there is an input instance x, calculate g ¼ Esign r FGSM algorithm calculates the gradient of the target data and make it increase in reverse so as to make the model random misclassification or specified misclassification. Kurakin et al. [3] put forward an iterative algorithm BIM based on FGSM, which improves the attack effect of adversarial examples through repeated negative gradient calculations. Papernot et al. [4] formalized the space of attacks against DNNs, and introduced a new type of algorithm JSMA. On the basis of the accurate understanding of the mapping between the input and output of DNNs, the adversarial examples were made with a success rate of 97%. JSMA determines the sensitive data position in the sample by calculating the saliency map value of the sample, and achieves good results at the cost of a small amount of data change. Su et al. [5] proposed a new single-pixel adversarial example generation approach based on differential evolution (DE). Papernot et al. [6] proposed a defense mechanism called defensive distillation to reduce the aggressiveness of adversarial examples to deep learning models. Carlini et al. [7] introduced the C&W algorithm, which can be applied to L0, L2, L∞. Experiments show that the C&W algorithm succeeds with 100% probability on both distilled and undistilled neural networks, which proves that defensive distillation does not significantly improve the robustness of neural networks. Moosavi-Dezfooli et al. [8] proposed the DeepFool algorithm to accurately calculate the robustness of the latest deep classifiers to disturbances on large-scale data sets, and quantify the robustness of these classifiers by calculating the perturbation of attacking deep networks. Moosavidezfooli et al. [9] proposed a universal perturbation attack, which has good universality in neural networks. Sarkar et al. [10] proposed two attack methods, UPSET and Houdini. The UPSET method can generate general disturbances of the target class, and the Houdini method can generate image-specific disturbances. Baluja et al. [11] proposed a network for generating adversarial examples, which performs fast and provides exceptionally diverse outputs.
In the field of GAN, Goodfellow et al. [12] proposed Generative Adversarial Network in 2014. In this paper, he analyzed the advantages and disadvantages of GAN and pointed out the future research direction and expansion. In the same year, Mirza et al. [13] and others proposed a conditional generative adversarial network, which is a conditional version of the GAN. It is proved that CGAN can generate MNIST digits  [14] published a paper in 2017, which introduced WGAN. In this paper, he used the theory of maximum likelihood estimation to explain the significance of learning probability distribution in unsupervised learning, used one division to approximate the true distribution, and solved it by minimizing two KL divergence between distributions. Li et al. [15] pointed out that although progress has been made in generating image modeling, it is still an elusive goal to successfully generate highresolution and diverse samples from complex data sets such as ImageNet [16]. Therefore, they proposed applying orthogonal regularization to the generator to make it suitable for simple "truncation techniques", allowing precise control of the trade-off between sample fidelity and variation by truncating the latent space. Such modification led to the model reaching a new level of technology in image synthesis under category conditions. Zhang et al. [17] proposed an alternative generator architecture for generative adversarial networks. This new architecture [18] is able to control the high-level attributes of the generated image, such as hairstyle, freckles. The generated image scores better on some evaluation criteria.
The previous adversary algorithms are directly generated by neural networks, while the algorithm proposed in this paper is based on the generated adversarial examples extracting features through GAN networks, which can generate similar adversarial samples in batches with effectiveness and high aggressiveness.
2 Related Works 2.1 FGSM Ian Goodfellow [2] found that the accuracy of a single input function is limited in many problems. For example, they usually use only 8 bits per pixel in digital images domain, so they discard all information below 1/255 of the dynamic range. Owing to the features of limited accuracy, in the case of inputting x and x A ¼ x þ g, if the perturbation of each element of h is lower than the accuracy, the classifier cannot correctly classify. Formally, for the problem of good category separation, as long as h j j j j 1 <E, you can expect the classifier to assign the same class to x and x A , where E is small enough to be discarded by related sensors or data storage devices.
Consider the dot product between the weight vector v and the adversarial example x A : Adversarial disturbances increase the activation value by v T h. By assigning h ¼ sign v ð Þ, this increment can be maximized under the maximum norm constraint on h. If x has n dimensions and the average size of the weight vector elements is m, then the activation value will increase by Emn. Because h j j j j 1 will not increase with the dimension of the problem, but the activation change caused by the h disturbance can increase linearly with n. Then for high-dimensional problems, many infinitesimal changes can be made to the input, resulting in one large change to the output.
That is to say, if the input of the simple linear model has sufficient dimensions, adversarial examples can be generated. Under this condition, suppose that u is the model parameter, x is the model input, y is the label associated with x, and J h; x; y ð Þis the loss function of training the neural network. The cost function can be linearized near the current value of u to obtain the disturbance of the optimal maximum norm constraint: Since n-dimensional gradient calculation is applied in the formula, this algorithm is called Fast Gradient Sign Method (FGSM) [19], which can use backpropagation [20] to efficiently calculate the required gradient.

DCGAN
DCGAN is a new method after Ian J. Goodfellow's pioneering GAN in his paper in 2014 that combines GAN and convolutional networks to solve the instability of GAN training [21]. CNN has made great achievements in the field of supervised learning for a long time, such as large-scale image classification and target detection, but has not made particular progress in the field of unsupervised learning [22], which inspired Alec Radford proposed DCGAN, which combines CNN with GAN, and demonstrated its impressive achievements in unsupervised learning [23]. Through training on a large number of different data sets, it is fully demonstrated that the generator and discriminator of DCGAN have learned a rich level of expression in terms of object components and scenes [24].
The main reasons why DCGAN can improve the stability of GAN training are as follows: 1. Use step-size convolution instead of the up-sampling layer [25]. Convolution has a good effect in extracting image features, and convolution is used to instead the fully connected layer.
2. Almost every layer in the generator G and the discriminator D uses the batch normalization layer to normalize the output of the feature layer together, which speeds up the training and improves the stability of the training.
3. The LeakyRelu activation function is used in the discriminator instead of the Relu activation function to prevent gradient sparseness. Relu is still used in the generator, but Tanh is used in the output layer [26]. 4. Use Adam optimizer to train, and the best learning rate is 0.0002.

Adversarial Training
In order to improve the robustness of the neural network, the neural network can be regularized When the data is disturbed by counter disturbances, the counter training process can be seen as trying to minimize the worst-case error. This training process can be interpreted as adding the noise with U ÀE; E ð Þto the noisy examples in the input and minimizing the upper limit of the expected cost. Adversarial training can also be seen as a form of active learning in which the model could request labels for new points. In this situation, the human tagger will be replaced with a heuristic tagger which copies tags from nearby points.

Definition
In the remaining of the paper we use the following notation: M* represents the neural network model trained by *, T(*) stands for training on the * data, FGSM(M*) denotes the FGSM attack against the * model, D(*) signifies the adversarial sample generated by *, G(*) symbolizes the DCGAN generator trained by *.

Algorithm Introduction
In order to verify the effectiveness and offensiveness of generating new adversarial examples in the DCGAN, the experiment is divided into two parts:

Unprotected Aggressiveness
The

Protected Aggressiveness
The MNIST dataset is trained according to the model set in Section 4.2.2, and the corresponding adversarial training is performed on the model at the same time, and the model M defence with the ability to defend against the adversarial examples is obtained, and the adversarial examples are generated according to the steps in Section 4.3.1, as shown in Fig. 3. If the generated images can attack the model M defence , it means that the method is valid under the condition of the protected model.

Experimental Environment
The operating system used in the experiment is Ubuntu 18.04, the CPU is Intel Xeon(R)CPU E5-2609 v4 @ 1.70 GHz, the GPU is Nvidia 1080Ti with 8 GB of memory.

MNIST
MNIST (Mixed National Institute of Standards and Technology database) is a computer vision data set that contains 70,000 gray-scale pictures of handwritten numbers, each of which contains 28 Â 28 pixels. Each picture has a corresponding label. The data set is divided into two parts: a training data set of 60,000 rows and a test data set of 10,000 rows. Among them: the training set of 60000 rows is divided into the training set of 55000 rows and the verification set of 5000 rows. The MNIST data set is widely used in the domain of deep learning and computer vision.

Parameters
There are three kinds of neural network models in this experiment, which respectively are used for CNN model training, DCGAN generator and DCGAN discriminator construction.
In the CNN model, it is composed of two convolution kernels with 32 filters and a 3 × 3 size, a convolution layer with a step size of 1, two 64 filters immediately after the maximum pooling, and a 3 × 3 convolution kernel, the convolutional layer of the convolutional layer with a step size of 1, two 200unit full units after the maximum pooling and a 10-unit Softmax fully connected layer for classification, as shown in Tab. 1.
The specific and detailed structural parameters of these three networks are shown in Tabs. 1-3:

Experimental Results
The evaluation indicators of this article are as follows: The cost function we use is cross entropy function: Attack Acc ¼ 1 À Acc Aiming at the effectiveness and aggressiveness of adversarial examples, the experiment was divided into two groups.

Conclusion
In this paper, in-depth research on adversarial algorithms is conducted to find a method to In recent years, GAN technology and adversarial learning technology have developed rapidly. There are more than 200 types of networks in the GAN field with diverse performances. With the emergence of attack methods in adversarial learning, there will be more and more options for combining GAN with adversarial learning. Although this article points out the feasibility of combining GAN with adversarial learning, there are still some shortcomings in generality. In the next stage, we will actively seek more options and improve the performance of the attack model.   Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.