Skip to main content

Kallima: A Clean-Label Framework for Textual Backdoor Attacks

  • Conference paper
  • First Online:
Book cover Computer Security – ESORICS 2022 (ESORICS 2022)

Abstract

Although Deep Neural Network (DNN) has led to unprecedented progress in various natural language processing (NLP) tasks, research shows that deep models are extremely vulnerable to backdoor attacks. The existing backdoor attacks mainly inject a small number of poisoned samples into the training dataset with the labels changed to the target one. Such mislabeled samples would raise suspicion upon human inspection, potentially revealing the attack. To improve the stealthiness of textual backdoor attacks, we propose the first clean-label framework Kallima for synthesizing \(mimesis\)-style backdoor samples to develop insidious textual backdoor attacks. We modify inputs belonging to the target class with adversarial perturbations, making the model rely more on the backdoor trigger. Our framework is compatible with most existing backdoor triggers. The experimental results on three benchmark datasets demonstrate the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://translate.google.cn.

References

  1. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. CoRR abs/1409.0473 (2014)

    Google Scholar 

  2. Chan, A., Tay, Y., Ong, Y.S., Zhang, A.: Poison attacks against text datasets with conditional adversarially regularized autoencoder. CoRR abs/2010.02684 (2020)

    Google Scholar 

  3. Chen, X., et al.: BadNL: backdoor attacks against NLP models with semantic-preserving improvements. In: ACSAC, pp. 554–569. ACM (2021)

    Google Scholar 

  4. Dai, J., Chen, C., Li, Y.: A backdoor attack against LSTM-based text classification systems. IEEE Access 7, 138872–138878 (2019)

    Article  Google Scholar 

  5. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018)

    Google Scholar 

  6. Gan, L., et al.: Triggerless backdoor attack for NLP tasks with clean labels. CoRR abs/2111.07970 (2021)

    Google Scholar 

  7. Gu, T., Dolan-Gavitt, B., Grag, S.: BadNets: identifying vulnerabilities in the machine learning model supply chain. CoRR abs/1708.06733 (2017)

    Google Scholar 

  8. Hisamoto, S., Post, M., Duh, K.: Membership inference attacks on sequence-to-sequence models: is my data in your machine translation system? Trans. Assoc. Comput. Linguist. 8, 49–63 (2020)

    Article  Google Scholar 

  9. Jin, D., Jin, Z., Zhou, J.T., Szolovits, P.: Is BERT really robust? A strong baseline for natural language attack on text classification and entailment. In: AAAI, pp. 8018–8025 (2020)

    Google Scholar 

  10. Kurita, K., Michel, P., Neubig, G.: Weight poisoning attacks on pretrained models. In: ACL, pp. 2793–2806. ACL, Online (2020)

    Google Scholar 

  11. Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: Albert: a lite BERT for self-supervised learning of language representations. In: ICLR (2019)

    Google Scholar 

  12. Li, J., et al.: TextBugger: generating adversarial text against real-world applications. In: Proceedings of the 26th NDSS (2019)

    Google Scholar 

  13. Li, L., Ma, R., Guo, Q., Xue, X., Qiu, X.: BERT-ATTACK: adversarial attack against BERT using BERT. In: EMNLP, pp. 6193–6202. ACL, Online, November 2020

    Google Scholar 

  14. Li, S., et al.: Hidden backdoors in human-centric language models. In: CCS. ACM (2021)

    Google Scholar 

  15. Munikar, M., Shakya, S., Shrestha, A.: Fine-grained sentiment classification using BERT. CoRR abs/1910.03474 (2019)

    Google Scholar 

  16. Qi, F., Chen, Y., Zhang, X., Li, M., Liu, Z., Sun, M.: Mind the style of text! Adversarial and backdoor attacks based on text style transfer. In: EMNLP. ACL (2021)

    Google Scholar 

  17. Qi, F., et al.: Hidden killer: invisible textual backdoor attacks with syntactic trigger. In: Proceedings of the 59th ACL-IJCNLP, pp. 443–453 (2021)

    Google Scholar 

  18. Qi, F., Yao, Y., Xu, S., Liu, Z., Sun, M.: Turn the combination lock: Learnable textual backdoor attacks via word substitution. In: Proceedings of the 59th ACL-IJCNLP, pp. 4873–4883 (2021)

    Google Scholar 

  19. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI blog (2019)

    Google Scholar 

  20. Rajpurkar, P., Jia, R., Liang, P.: Know what you don’t know: unanswerable questions for squad. In: Proceedings of the 56th ACL, pp. 784–789 (2018)

    Google Scholar 

  21. Redmiles, E.M., Zhu, Z., Kross, S., Kuchhal, D., Dumitras, T., Mazurek, M.L.: Asking for a friend: evaluating response biases in security user studies. In: Proceedings of ACM CCS 2018, pp. 1238–1255 (2018)

    Google Scholar 

  22. Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: EMNLP-IJCNLP, pp. 3982–3992. ACL (2019)

    Google Scholar 

  23. Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR abs/1910.01108 (2019)

    Google Scholar 

  24. Sennrich, R., Haddow, B., Birch, A.: Improving neural machine translation models with monolingual data. In: ACL, pp. 86–96. ACL, Berlin (2016)

    Google Scholar 

  25. Shokri, R., Stronati, M., Song, C., Shmatikov, V.: Membership inference attacks against machine learning models. In: S &P, pp. 3–18. IEEE (2017)

    Google Scholar 

  26. Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: EMNLP, pp. 1631–1642. ACL (2013)

    Google Scholar 

  27. Song, C., Shmatikov, V.: Auditing data provenance in text-generation models. In: Proceedings of the 25th ACM SIGKDD, pp. 196–206 (2019)

    Google Scholar 

  28. Turner, A., Tsipras, D., Madry, A.: Label-consistent backdoor attacks. CoRR abs/1912.02771 (2019)

    Google Scholar 

  29. Wang, B., et al.: Neural cleanse: identifying and mitigating backdoor attacks in neural networks. In: S &P, pp. 707–723. IEEE (2019)

    Google Scholar 

  30. Wolf, T., et al.: Transformers: state-of-the-art natural language processing. In: Proceedings of EMNLP 2020, pp. 38–45. ACL, Online (2020)

    Google Scholar 

  31. Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., Kumar, R.: Predicting the type and target of offensive posts in social media. In: NAACL-HLT (2019)

    Google Scholar 

  32. Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. Adv. Neural. Inf. Process. Syst. 28, 649–657 (2015)

    Google Scholar 

  33. Zhang, X., Zhang, Z., Ji, S., Wang, T.: Trojaning language models for fun and profit. CoRR abs/2008.00312 (2020)

    Google Scholar 

  34. Zhang, Y., Tao, G., Sun, X.: Parallel data augmentation for formality style transfer. In: ACL (2020)

    Google Scholar 

  35. Zhao, S., Ma, X., Zheng, X., Bailey, J., Chen, J., Jiang, Y.G.: Clean-label backdoor attacks on video recognition models. In: CVPR, pp. 14431–14440 (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Qingni Shen or Zhonghai Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, X., Dong, Y., Sun, Z., Zhai, S., Shen, Q., Wu, Z. (2022). Kallima: A Clean-Label Framework for Textual Backdoor Attacks. In: Atluri, V., Di Pietro, R., Jensen, C.D., Meng, W. (eds) Computer Security – ESORICS 2022. ESORICS 2022. Lecture Notes in Computer Science, vol 13554. Springer, Cham. https://doi.org/10.1007/978-3-031-17140-6_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-17140-6_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-17139-0

  • Online ISBN: 978-3-031-17140-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics