Kallima: A Clean-Label Framework for Textual Backdoor Attacks

Chen, Xiaoyi; Dong, Yinpeng; Sun, Zeyu; Zhai, Shengfang; Shen, Qingni; Wu, Zhonghai

doi:10.1007/978-3-031-17140-6_22

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13554))

Included in the following conference series:

European Symposium on Research in Computer Security

2270 Accesses
5 Citations

Abstract

Although Deep Neural Network (DNN) has led to unprecedented progress in various natural language processing (NLP) tasks, research shows that deep models are extremely vulnerable to backdoor attacks. The existing backdoor attacks mainly inject a small number of poisoned samples into the training dataset with the labels changed to the target one. Such mislabeled samples would raise suspicion upon human inspection, potentially revealing the attack. To improve the stealthiness of textual backdoor attacks, we propose the first clean-label framework Kallima for synthesizing \(mimesis\)-style backdoor samples to develop insidious textual backdoor attacks. We modify inputs belonging to the target class with adversarial perturbations, making the model rely more on the backdoor trigger. Our framework is compatible with most existing backdoor triggers. The experimental results on three benchmark datasets demonstrate the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://translate.google.cn.

References

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. CoRR abs/1409.0473 (2014)
Google Scholar
Chan, A., Tay, Y., Ong, Y.S., Zhang, A.: Poison attacks against text datasets with conditional adversarially regularized autoencoder. CoRR abs/2010.02684 (2020)
Google Scholar
Chen, X., et al.: BadNL: backdoor attacks against NLP models with semantic-preserving improvements. In: ACSAC, pp. 554–569. ACM (2021)
Google Scholar
Dai, J., Chen, C., Li, Y.: A backdoor attack against LSTM-based text classification systems. IEEE Access 7, 138872–138878 (2019)
Article Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018)
Google Scholar
Gan, L., et al.: Triggerless backdoor attack for NLP tasks with clean labels. CoRR abs/2111.07970 (2021)
Google Scholar
Gu, T., Dolan-Gavitt, B., Grag, S.: BadNets: identifying vulnerabilities in the machine learning model supply chain. CoRR abs/1708.06733 (2017)
Google Scholar
Hisamoto, S., Post, M., Duh, K.: Membership inference attacks on sequence-to-sequence models: is my data in your machine translation system? Trans. Assoc. Comput. Linguist. 8, 49–63 (2020)
Article Google Scholar
Jin, D., Jin, Z., Zhou, J.T., Szolovits, P.: Is BERT really robust? A strong baseline for natural language attack on text classification and entailment. In: AAAI, pp. 8018–8025 (2020)
Google Scholar
Kurita, K., Michel, P., Neubig, G.: Weight poisoning attacks on pretrained models. In: ACL, pp. 2793–2806. ACL, Online (2020)
Google Scholar
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: Albert: a lite BERT for self-supervised learning of language representations. In: ICLR (2019)
Google Scholar
Li, J., et al.: TextBugger: generating adversarial text against real-world applications. In: Proceedings of the 26th NDSS (2019)
Google Scholar
Li, L., Ma, R., Guo, Q., Xue, X., Qiu, X.: BERT-ATTACK: adversarial attack against BERT using BERT. In: EMNLP, pp. 6193–6202. ACL, Online, November 2020
Google Scholar
Li, S., et al.: Hidden backdoors in human-centric language models. In: CCS. ACM (2021)
Google Scholar
Munikar, M., Shakya, S., Shrestha, A.: Fine-grained sentiment classification using BERT. CoRR abs/1910.03474 (2019)
Google Scholar
Qi, F., Chen, Y., Zhang, X., Li, M., Liu, Z., Sun, M.: Mind the style of text! Adversarial and backdoor attacks based on text style transfer. In: EMNLP. ACL (2021)
Google Scholar
Qi, F., et al.: Hidden killer: invisible textual backdoor attacks with syntactic trigger. In: Proceedings of the 59th ACL-IJCNLP, pp. 443–453 (2021)
Google Scholar
Qi, F., Yao, Y., Xu, S., Liu, Z., Sun, M.: Turn the combination lock: Learnable textual backdoor attacks via word substitution. In: Proceedings of the 59th ACL-IJCNLP, pp. 4873–4883 (2021)
Google Scholar
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI blog (2019)
Google Scholar
Rajpurkar, P., Jia, R., Liang, P.: Know what you don’t know: unanswerable questions for squad. In: Proceedings of the 56th ACL, pp. 784–789 (2018)
Google Scholar
Redmiles, E.M., Zhu, Z., Kross, S., Kuchhal, D., Dumitras, T., Mazurek, M.L.: Asking for a friend: evaluating response biases in security user studies. In: Proceedings of ACM CCS 2018, pp. 1238–1255 (2018)
Google Scholar
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: EMNLP-IJCNLP, pp. 3982–3992. ACL (2019)
Google Scholar
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR abs/1910.01108 (2019)
Google Scholar
Sennrich, R., Haddow, B., Birch, A.: Improving neural machine translation models with monolingual data. In: ACL, pp. 86–96. ACL, Berlin (2016)
Google Scholar
Shokri, R., Stronati, M., Song, C., Shmatikov, V.: Membership inference attacks against machine learning models. In: S &P, pp. 3–18. IEEE (2017)
Google Scholar
Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: EMNLP, pp. 1631–1642. ACL (2013)
Google Scholar
Song, C., Shmatikov, V.: Auditing data provenance in text-generation models. In: Proceedings of the 25th ACM SIGKDD, pp. 196–206 (2019)
Google Scholar
Turner, A., Tsipras, D., Madry, A.: Label-consistent backdoor attacks. CoRR abs/1912.02771 (2019)
Google Scholar
Wang, B., et al.: Neural cleanse: identifying and mitigating backdoor attacks in neural networks. In: S &P, pp. 707–723. IEEE (2019)
Google Scholar
Wolf, T., et al.: Transformers: state-of-the-art natural language processing. In: Proceedings of EMNLP 2020, pp. 38–45. ACL, Online (2020)
Google Scholar
Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., Kumar, R.: Predicting the type and target of offensive posts in social media. In: NAACL-HLT (2019)
Google Scholar
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. Adv. Neural. Inf. Process. Syst. 28, 649–657 (2015)
Google Scholar
Zhang, X., Zhang, Z., Ji, S., Wang, T.: Trojaning language models for fun and profit. CoRR abs/2008.00312 (2020)
Google Scholar
Zhang, Y., Tao, G., Sun, X.: Parallel data augmentation for formality style transfer. In: ACL (2020)
Google Scholar
Zhao, S., Ma, X., Zheng, X., Bailey, J., Chen, J., Jiang, Y.G.: Clean-label backdoor attacks on video recognition models. In: CVPR, pp. 14431–14440 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Peking University, Beijing, 100871, China
Xiaoyi Chen, Shengfang Zhai, Qingni Shen & Zhonghai Wu
Tsinghua University, Beijing, 100084, China
Yinpeng Dong
RealAI, Beijing, China
Yinpeng Dong
Zhongguancun Laboratory, Beijing, China
Zeyu Sun

Authors

Xiaoyi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yinpeng Dong
View author publications
You can also search for this author in PubMed Google Scholar
Zeyu Sun
View author publications
You can also search for this author in PubMed Google Scholar
Shengfang Zhai
View author publications
You can also search for this author in PubMed Google Scholar
Qingni Shen
View author publications
You can also search for this author in PubMed Google Scholar
Zhonghai Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Qingni Shen or Zhonghai Wu .

Editor information

Editors and Affiliations

Rutgers University, Newark, NJ, USA
Vijayalakshmi Atluri
Hamad Bin Khalifa University, Doha, Qatar
Roberto Di Pietro
Technical University of Denmark, Kongens Lyngby, Denmark
Christian D. Jensen
Technical University of Denmark, Kongens Lyngby, Denmark
Weizhi Meng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, X., Dong, Y., Sun, Z., Zhai, S., Shen, Q., Wu, Z. (2022). Kallima: A Clean-Label Framework for Textual Backdoor Attacks. In: Atluri, V., Di Pietro, R., Jensen, C.D., Meng, W. (eds) Computer Security – ESORICS 2022. ESORICS 2022. Lecture Notes in Computer Science, vol 13554. Springer, Cham. https://doi.org/10.1007/978-3-031-17140-6_22

Download citation

DOI: https://doi.org/10.1007/978-3-031-17140-6_22
Published: 25 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-17139-0
Online ISBN: 978-3-031-17140-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Kallima: A Clean-Label Framework for Textual Backdoor Attacks