Elsevier

Neural Networks

Volume 127, July 2020, Pages 168-181
Neural Networks

Vulnerability of classifiers to evolutionary generated adversarial examples

https://doi.org/10.1016/j.neunet.2020.04.015Get rights and content

Abstract

This paper deals with the vulnerability of machine learning models to adversarial examples and its implication for robustness and generalization properties. We propose an evolutionary algorithm that can generate adversarial examples for any machine learning model in the black-box attack scenario. This way, we can find adversarial examples without access to model’s parameters, only by querying the model at hand. We have tested a range of machine learning models including deep and shallow neural networks. Our experiments have shown that the vulnerability to adversarial examples is not only the problem of deep networks, but it spreads through various machine learning architectures. Rather, it depends on the type of computational units. Local units, such as Gaussian kernels, are less vulnerable to adversarial examples.

Introduction

Deep networks have become the state-of-art machine learning methods in a range of machine learning applications, including natural language processing, image recognition, and reinforcement learning (Ian Goodfellow & Courville, 2016).

In the area of pattern recognition, deep and convolutional neural networks, in particular, achieved several human-competitive results (Bengio, 2009, Hinton, 2007, Krizhevsky et al., 2012). Concerning these results, there is a question if these methods achieve similar capabilities as human vision, such as a generalization. This paper deals with a property of machine learning models that demonstrates a difference. Let us have a classifier and an image, correctly classified by the classifier as a certain class (for example an image of a hand-written digit 5). It is possible to slightly change the image, so as for human eyes, there is almost no difference, but the classifier classifies the image as something completely else (such as digit zero).

This counter-intuitive property of neural networks was first described in Szegedy et al. (2013). It is connected to the instability of a neural network with respect to small perturbation of their inputs. Such perturbed examples are called adversarial examples (Szegedy et al., 2013). The adversarial examples differ only slightly from correctly classified examples drawn from the data distribution, but they are classified incorrectly by the classifier trained on the data. Not only they are classified incorrectly, but they can also be classified as a class of our choice.

This paper examines vulnerability to adversarial examples throughout a variety of machine learning methods. We propose a genetic algorithm for generating adversarial examples. Though the evolutionary search for adversarial examples is slower than techniques described in Goodfellow et al., 2014, Szegedy et al., 2013, it enables to obtain adversarial examples without access to model’s weights. Thus, we have a unified approach for a wide range of machine learning models, including not only neural networks, but also support vector machine classifiers (SVMs), decision trees, and possibly others. The only thing this approach needs is the possibility to query the classifier to evaluate a given example. From this point of view, the misclassification of adversarial examples represents a security flaw in the corresponding classifier.

The main contribution of this work is the proposal of an original search procedure that is capable of the so-called black box attacks on any image classifier. Black box attacks do not require access to classifier inner structure (such as weights of a neural network), in fact, they do not need to know the type of the classifier at all. Most of the successful methods for generating adversarial images described in the literature are not suitable for black-box attacks because they rely on particular classifier type and access to its parameters. In this paper, we present and test our genetic algorithm for generating adversarial examples for several image classifiers including two popular deep network architectures. We demonstrate that vulnerability to adversarial examples is not a special property of deep networks, these examples can be easily generated for other classifiers as well. Our general approach further allows us to draw conclusions about the robustness of different classifiers and possible measures against adversarial attacks.

This paper is organized as follows. First, Section 2 reviews related work in adversarial examples, Section 3 describes the proposed genetic algorithm. Sections 4 Experimental results, 5 More realistic example present results of our experiments with two image data sets. Section 6 contains discussion of the results, and Section 7 concludes our paper. An overview of used machine learning models and their settings is included in Appendix.

Section snippets

Related work

The adversarial examples were first introduced in Szegedy et al. (2013). The paper shows that having a trained network it is possible to arbitrarily change the network prediction by applying an imperceptible non-random perturbation to an input image. Such perturbations are found by optimizing the input to maximize the prediction error. The box-constrained Limited-memory Broyden–Fletcher–Goldfarb–Shanno algorithm (L-BFGS) was used for this optimization.

On some data sets, such as ImageNet (Deng

Evolutionary algorithm

In this section, our original evolutionary algorithm for generating adversarial examples is described. The algorithm searches for a pattern (image) that is misclassified by a trained machine learning model. The whole process works in a black-box scenario, i.e. the algorithm only queries the model for classification of various input images, it does not consider any particular knowledge about the model, neither it has access to its inner parameters. Thus, the approach is general and can be

Experimental results

The aim of our experiments is to inspect the vulnerability of different types of machine learning models to adversarial examples and to study the transferability of adversarial examples over those models.

More realistic example

Since the MNIST data are quite simple and contain only greyscale images, we decided to use a more realistic data set to verify the results. We have chosen the well known German Traffic Sign Recognition (GTSRB) data set (Stallkamp, Schlipsing, Salmen, & Igel, 2011) that contains photos of traffic signs. The images are classified into 43 classes. There are 39 209 train images and 12 630 test images. Before the experiment, a histogram normalization was performed, and the images were cropped and

Discussion

Our experiments have shown that the vulnerability to adversarial examples is not a question of deep versus shallow models, on the contrary, shallow models, like SVMs, are vulnerable, too. Less vulnerable are models with local units, especially RBF networks. Gaussian RBF units compute function eγcx2, i.e. hidden layer does not contain the element wx, so there is no such an increase in activation as for the linear models. However, in some cases, it is possible to find adversarial examples

Conclusion

In this work, an evolutionary algorithm was proposed to solve the problem of generating adversarial examples for classifiers by applying minimal changes to the existing patterns. Our algorithm is able to generate adversarial examples in the black-box attack scenario, just by querying the classifier, which is not true for the majority of methods described in the literature. Moreover, the algorithm allows to study and compare the vulnerability of different classifiers.

Our experiments showed that

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work was partially supported by the Czech Science Foundation (GA ČR) grant 18-23827S, and institutional support of the Institute of Computer Science RVO 67985807.

References (30)

  • FortinFélix-Antoine et al.

    DEAP: Evolutionary algorithms made easy

    Journal of Machine Learning Research (JMLR)

    (2012)
  • GirosiF. et al.

    Regularization theory and neural networks architectures

    Neural Computation

    (1995)
  • GoodfellowIan J. et al.

    Explaining and harnessing adversarial examples

    (2014)
  • GuShixiang et al.

    Towards deep neural network architectures robust to adversarial examples

    CoRR

    (2014)
  • Ian GoodfellowYoshua Bengio et al.

    Deep learning

    (2016)
  • Cited by (0)

    View full text