Abstract
Recent studies have highlighted the vulnerability and low robustness of deep learning model against adversarial examples. This issue limits their deployability on ubiquitous applications requiring a high level of security such as driverless system, unmanned aerial vehicle and intrusion detection. In this paper, we propose latent encodings transferring attack (LET-attack) to generate target natural adversarial examples to fool well-trained classifiers. In order to perturb in latent space, we train WGAN-variants on various datasets to achieve feature extraction, image reconstruction and image discrimination against counterfeit with good performance. Thanks to our two-stage procedure of mapping transformation, the adversary performs precise and semantic perturbations on source data referring to target data in latent space. By using the critic in WGAN-variant and the well-trained classifier, the adversary crafts more verisimilar and effective adversarial examples. As shown in the experimental results on MNIST, FashionMNIST, CIFAR-10 and LSUN, LET-attack can yield a distinct set of adversarial examples with partly data manifold targeted transfer and attains similar attack performance against state-of-the-art models in different attack scenarios. What is more, we evaluate LET-attack on the characteristic of transferability in different classifiers on MNIST and CIFAR-10 respectively, and find that the adversarial examples are easy to transfer with high confidence.
This study was supported by the National Key R&D Program of China (2018YFB1500902) and NUPTSF (Grant No.NY219122).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Szegedy, C., Zaremba, W., Sutskever, I., et al.: Intriguing properties of neural networks. Comput. Sci. (2013)
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. Comput. Sci. (2014)
Tramèr, F., Papernot, N., Goodfellow, I., et al.: The space of transferable adversarial examples (2017)
Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: high confidence predictions for unrecognizable images (2014)
Fawzi, A., Fawzi, O., Frossard, P.: Analysis of classifiers’ robustness to adversarial perturbations. Mach. Learn. 107(3), 481–508 (2015)
Rauber, J., Brendel, W., Bethge, M.: Foolbox: a python toolbox to benchmark the robustness of machine learning models (2017)
Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial examples in the physical world (2016)
Evtimov, I., Eykholt, K., Fernandes, E., et al.: Robust physical-world attacks on deep learning models (2018)
Rozsa, A., Günther, M., Rudd, E.M., et al.: Facial attributes: accuracy and adversarial robustness. Pattern Recognit. Lett., S0167865517303926 (2018)
Ross, A.S., Doshi-Velez, F.: Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients (2017)
Sinha, A., Namkoong, H., Duchi, J.: Certifying some distributional robustness with principled adversarial training (2017)
Tramèr, F., Kurakin, A., Papernot, N., et al.: Ensemble adversarial training: attacks and defenses (2017)
Ma, X., Li, B., Wang, Y., et al.: Characterizing adversarial subspaces using local intrinsic dimensionality (2018)
Papernot, N., Mcdaniel, P.: On the effectiveness of defensive distillation (2016)
Metzen, J.H., Genewein, T., Fischer, V., et al.: On detecting adversarial perturbations (2017)
Meng, D., Chen, H.: MagNet: a two-pronged defense against adversarial examples (2017)
Zhao, Z., Dua, D., Singh, S.: Generating natural adversarial examples (2017)
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN (2017)
Gulrajani, I., Ahmed, F., Arjovsky, M., et al.: Improved Training of Wasserstein GANs (2017)
Papernot, N., Mcdaniel, P., Jha, S., et al.: The limitations of deep learning in adversarial settings (2015)
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks (2016)
Narodytska, N., Kasiviswanathan, S.P.: Simple black-box adversarial perturbations for deep networks (2016)
Hayes, J., Danezis, G.: Machine learning as an adversarial service: learning black-box adversarial examples (2017)
Papernot, N., Mcdaniel, P., Goodfellow, I., et al.: Practical black-box attacks against machine learning (2017)
Brendel, W., Rauber, J., Bethge, M.: Decision-based adversarial attacks: reliable attacks against black-box machine learning models (2017)
Baluja, S., Fischer, I.: Adversarial transformation networks: learning to generate adversarial examples (2017)
Ebrahimi, J., Lowd, D., Dou, D.: On adversarial examples for character-level neural machine translation (2018)
Carlini, N., Wagner, D.: Defensive distillation is not robust to adversarial examples (2016)
Acknowledgment
This work was supported by the National Key R&D Program of China (2018YFB 1500902) and NUPTSF (Grant No. NY219122).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, J., Zhang, Z. (2019). LET-Attack: Latent Encodings of Normal-Data Manifold Transferring to Adversarial Examples. In: Liu, F., Xu, J., Xu, S., Yung, M. (eds) Science of Cyber Security. SciSec 2019. Lecture Notes in Computer Science(), vol 11933. Springer, Cham. https://doi.org/10.1007/978-3-030-34637-9_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-34637-9_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34636-2
Online ISBN: 978-3-030-34637-9
eBook Packages: Computer ScienceComputer Science (R0)