Vulnerability of classifiers to evolutionary generated adversarial examples
Introduction
Deep networks have become the state-of-art machine learning methods in a range of machine learning applications, including natural language processing, image recognition, and reinforcement learning (Ian Goodfellow & Courville, 2016).
In the area of pattern recognition, deep and convolutional neural networks, in particular, achieved several human-competitive results (Bengio, 2009, Hinton, 2007, Krizhevsky et al., 2012). Concerning these results, there is a question if these methods achieve similar capabilities as human vision, such as a generalization. This paper deals with a property of machine learning models that demonstrates a difference. Let us have a classifier and an image, correctly classified by the classifier as a certain class (for example an image of a hand-written digit 5). It is possible to slightly change the image, so as for human eyes, there is almost no difference, but the classifier classifies the image as something completely else (such as digit zero).
This counter-intuitive property of neural networks was first described in Szegedy et al. (2013). It is connected to the instability of a neural network with respect to small perturbation of their inputs. Such perturbed examples are called adversarial examples (Szegedy et al., 2013). The adversarial examples differ only slightly from correctly classified examples drawn from the data distribution, but they are classified incorrectly by the classifier trained on the data. Not only they are classified incorrectly, but they can also be classified as a class of our choice.
This paper examines vulnerability to adversarial examples throughout a variety of machine learning methods. We propose a genetic algorithm for generating adversarial examples. Though the evolutionary search for adversarial examples is slower than techniques described in Goodfellow et al., 2014, Szegedy et al., 2013, it enables to obtain adversarial examples without access to model’s weights. Thus, we have a unified approach for a wide range of machine learning models, including not only neural networks, but also support vector machine classifiers (SVMs), decision trees, and possibly others. The only thing this approach needs is the possibility to query the classifier to evaluate a given example. From this point of view, the misclassification of adversarial examples represents a security flaw in the corresponding classifier.
The main contribution of this work is the proposal of an original search procedure that is capable of the so-called black box attacks on any image classifier. Black box attacks do not require access to classifier inner structure (such as weights of a neural network), in fact, they do not need to know the type of the classifier at all. Most of the successful methods for generating adversarial images described in the literature are not suitable for black-box attacks because they rely on particular classifier type and access to its parameters. In this paper, we present and test our genetic algorithm for generating adversarial examples for several image classifiers including two popular deep network architectures. We demonstrate that vulnerability to adversarial examples is not a special property of deep networks, these examples can be easily generated for other classifiers as well. Our general approach further allows us to draw conclusions about the robustness of different classifiers and possible measures against adversarial attacks.
This paper is organized as follows. First, Section 2 reviews related work in adversarial examples, Section 3 describes the proposed genetic algorithm. Sections 4 Experimental results, 5 More realistic example present results of our experiments with two image data sets. Section 6 contains discussion of the results, and Section 7 concludes our paper. An overview of used machine learning models and their settings is included in Appendix.
Section snippets
Related work
The adversarial examples were first introduced in Szegedy et al. (2013). The paper shows that having a trained network it is possible to arbitrarily change the network prediction by applying an imperceptible non-random perturbation to an input image. Such perturbations are found by optimizing the input to maximize the prediction error. The box-constrained Limited-memory Broyden–Fletcher–Goldfarb–Shanno algorithm (L-BFGS) was used for this optimization.
On some data sets, such as ImageNet (Deng
Evolutionary algorithm
In this section, our original evolutionary algorithm for generating adversarial examples is described. The algorithm searches for a pattern (image) that is misclassified by a trained machine learning model. The whole process works in a black-box scenario, i.e. the algorithm only queries the model for classification of various input images, it does not consider any particular knowledge about the model, neither it has access to its inner parameters. Thus, the approach is general and can be
Experimental results
The aim of our experiments is to inspect the vulnerability of different types of machine learning models to adversarial examples and to study the transferability of adversarial examples over those models.
More realistic example
Since the MNIST data are quite simple and contain only greyscale images, we decided to use a more realistic data set to verify the results. We have chosen the well known German Traffic Sign Recognition (GTSRB) data set (Stallkamp, Schlipsing, Salmen, & Igel, 2011) that contains photos of traffic signs. The images are classified into 43 classes. There are 39 209 train images and 12 630 test images. Before the experiment, a histogram normalization was performed, and the images were cropped and
Discussion
Our experiments have shown that the vulnerability to adversarial examples is not a question of deep versus shallow models, on the contrary, shallow models, like SVMs, are vulnerable, too. Less vulnerable are models with local units, especially RBF networks. Gaussian RBF units compute function , i.e. hidden layer does not contain the element , so there is no such an increase in activation as for the linear models. However, in some cases, it is possible to find adversarial examples
Conclusion
In this work, an evolutionary algorithm was proposed to solve the problem of generating adversarial examples for classifiers by applying minimal changes to the existing patterns. Our algorithm is able to generate adversarial examples in the black-box attack scenario, just by querying the classifier, which is not true for the majority of methods described in the literature. Moreover, the algorithm allows to study and compare the vulnerability of different classifiers.
Our experiments showed that
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
This work was partially supported by the Czech Science Foundation (GA ČR) grant 18-23827S, and institutional support of the Institute of Computer Science RVO 67985807.
References (30)
- et al.
Wild patterns: Ten years after the rise of adversarial machine learning
Pattern Recognition
(2018) Learning multiple layers of representation
Trends in Cognitive Sciences
(2007)- et al.
Model complexities of shallow networks representing highly-varying functions
Neurocomputing
(2016) - et al.
Learning methods for radial basis functions networks
Future Generation Computer Systems
(2005) Learning deep architectures for AI
Foundations and Trends in Machine Learning
(2009)- et al.
Classification and regression trees (wadsworth statistics/probability)
(1984) - et al.
Adversarial examples are not easily detected: Bypassing ten detection methods
Keras
(2015)- et al.
Support-vector networks
Machine Learning
(1995) - Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. 2009. ImageNet: A large-scale hierarchical image...