Global-Local Characteristic Excited Cross-Modal Attacks from Images to Videos

Authors

  • Ruikui Wang School of Computer Science and Engineering, Beihang University, China
  • Yuanfang Guo School of Computer Science and Engineering, Beihang University, China Zhongguancun Laboratory, Beijing, China
  • Yunhong Wang School of Computer Science and Engineering, Beihang University, China

DOI:

https://doi.org/10.1609/aaai.v37i2.25362

Keywords:

CV: Adversarial Attacks & Robustness, ML: Adversarial Learning & Robustness

Abstract

The transferability of adversarial examples is the key property in practical black-box scenarios. Currently, numerous methods improve the transferability across different models trained on the same modality of data. The investigation of generating video adversarial examples with imagebased substitute models to attack the target video models, i.e., cross-modal transferability of adversarial examples, is rarely explored. A few works on cross-modal transferability directly apply image attack methods for each frame and no factors especial for video data are considered, which limits the cross-modal transferability of adversarial examples. In this paper, we propose an effective cross-modal attack method which considers both the global and local characteristics of video data. Firstly, from the global perspective, we introduce inter-frame interaction into attack process to induce more diverse and stronger gradients rather than perturb each frame separately. Secondly, from the local perspective, we disrupt the inherently local correlation of frames within a video, which prevents black-box video model from capturing valuable temporal clues. Extensive experiments on the UCF-101 and Kinetics-400 validate the proposed method significantly improves cross-modal transferability and even surpasses strong baseline using video models as substitute model. Our source codes are available at https://github.com/lwmming/Cross-Modal-Attack.

Downloads

Published

2023-06-26

How to Cite

Wang, R., Guo, Y., & Wang, Y. (2023). Global-Local Characteristic Excited Cross-Modal Attacks from Images to Videos. Proceedings of the AAAI Conference on Artificial Intelligence, 37(2), 2635-2643. https://doi.org/10.1609/aaai.v37i2.25362

Issue

Section

AAAI Technical Track on Computer Vision II