Abstract
Although Deep Neural Network (DNN) has led to unprecedented progress in various natural language processing (NLP) tasks, research shows that deep models are extremely vulnerable to backdoor attacks. The existing backdoor attacks mainly inject a small number of poisoned samples into the training dataset with the labels changed to the target one. Such mislabeled samples would raise suspicion upon human inspection, potentially revealing the attack. To improve the stealthiness of textual backdoor attacks, we propose the first clean-label framework Kallima for synthesizing \(mimesis\)-style backdoor samples to develop insidious textual backdoor attacks. We modify inputs belonging to the target class with adversarial perturbations, making the model rely more on the backdoor trigger. Our framework is compatible with most existing backdoor triggers. The experimental results on three benchmark datasets demonstrate the effectiveness of the proposed method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. CoRR abs/1409.0473 (2014)
Chan, A., Tay, Y., Ong, Y.S., Zhang, A.: Poison attacks against text datasets with conditional adversarially regularized autoencoder. CoRR abs/2010.02684 (2020)
Chen, X., et al.: BadNL: backdoor attacks against NLP models with semantic-preserving improvements. In: ACSAC, pp. 554–569. ACM (2021)
Dai, J., Chen, C., Li, Y.: A backdoor attack against LSTM-based text classification systems. IEEE Access 7, 138872–138878 (2019)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018)
Gan, L., et al.: Triggerless backdoor attack for NLP tasks with clean labels. CoRR abs/2111.07970 (2021)
Gu, T., Dolan-Gavitt, B., Grag, S.: BadNets: identifying vulnerabilities in the machine learning model supply chain. CoRR abs/1708.06733 (2017)
Hisamoto, S., Post, M., Duh, K.: Membership inference attacks on sequence-to-sequence models: is my data in your machine translation system? Trans. Assoc. Comput. Linguist. 8, 49–63 (2020)
Jin, D., Jin, Z., Zhou, J.T., Szolovits, P.: Is BERT really robust? A strong baseline for natural language attack on text classification and entailment. In: AAAI, pp. 8018–8025 (2020)
Kurita, K., Michel, P., Neubig, G.: Weight poisoning attacks on pretrained models. In: ACL, pp. 2793–2806. ACL, Online (2020)
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: Albert: a lite BERT for self-supervised learning of language representations. In: ICLR (2019)
Li, J., et al.: TextBugger: generating adversarial text against real-world applications. In: Proceedings of the 26th NDSS (2019)
Li, L., Ma, R., Guo, Q., Xue, X., Qiu, X.: BERT-ATTACK: adversarial attack against BERT using BERT. In: EMNLP, pp. 6193–6202. ACL, Online, November 2020
Li, S., et al.: Hidden backdoors in human-centric language models. In: CCS. ACM (2021)
Munikar, M., Shakya, S., Shrestha, A.: Fine-grained sentiment classification using BERT. CoRR abs/1910.03474 (2019)
Qi, F., Chen, Y., Zhang, X., Li, M., Liu, Z., Sun, M.: Mind the style of text! Adversarial and backdoor attacks based on text style transfer. In: EMNLP. ACL (2021)
Qi, F., et al.: Hidden killer: invisible textual backdoor attacks with syntactic trigger. In: Proceedings of the 59th ACL-IJCNLP, pp. 443–453 (2021)
Qi, F., Yao, Y., Xu, S., Liu, Z., Sun, M.: Turn the combination lock: Learnable textual backdoor attacks via word substitution. In: Proceedings of the 59th ACL-IJCNLP, pp. 4873–4883 (2021)
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI blog (2019)
Rajpurkar, P., Jia, R., Liang, P.: Know what you don’t know: unanswerable questions for squad. In: Proceedings of the 56th ACL, pp. 784–789 (2018)
Redmiles, E.M., Zhu, Z., Kross, S., Kuchhal, D., Dumitras, T., Mazurek, M.L.: Asking for a friend: evaluating response biases in security user studies. In: Proceedings of ACM CCS 2018, pp. 1238–1255 (2018)
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: EMNLP-IJCNLP, pp. 3982–3992. ACL (2019)
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR abs/1910.01108 (2019)
Sennrich, R., Haddow, B., Birch, A.: Improving neural machine translation models with monolingual data. In: ACL, pp. 86–96. ACL, Berlin (2016)
Shokri, R., Stronati, M., Song, C., Shmatikov, V.: Membership inference attacks against machine learning models. In: S &P, pp. 3–18. IEEE (2017)
Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: EMNLP, pp. 1631–1642. ACL (2013)
Song, C., Shmatikov, V.: Auditing data provenance in text-generation models. In: Proceedings of the 25th ACM SIGKDD, pp. 196–206 (2019)
Turner, A., Tsipras, D., Madry, A.: Label-consistent backdoor attacks. CoRR abs/1912.02771 (2019)
Wang, B., et al.: Neural cleanse: identifying and mitigating backdoor attacks in neural networks. In: S &P, pp. 707–723. IEEE (2019)
Wolf, T., et al.: Transformers: state-of-the-art natural language processing. In: Proceedings of EMNLP 2020, pp. 38–45. ACL, Online (2020)
Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., Kumar, R.: Predicting the type and target of offensive posts in social media. In: NAACL-HLT (2019)
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. Adv. Neural. Inf. Process. Syst. 28, 649–657 (2015)
Zhang, X., Zhang, Z., Ji, S., Wang, T.: Trojaning language models for fun and profit. CoRR abs/2008.00312 (2020)
Zhang, Y., Tao, G., Sun, X.: Parallel data augmentation for formality style transfer. In: ACL (2020)
Zhao, S., Ma, X., Zheng, X., Bailey, J., Chen, J., Jiang, Y.G.: Clean-label backdoor attacks on video recognition models. In: CVPR, pp. 14431–14440 (2020)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, X., Dong, Y., Sun, Z., Zhai, S., Shen, Q., Wu, Z. (2022). Kallima: A Clean-Label Framework for Textual Backdoor Attacks. In: Atluri, V., Di Pietro, R., Jensen, C.D., Meng, W. (eds) Computer Security – ESORICS 2022. ESORICS 2022. Lecture Notes in Computer Science, vol 13554. Springer, Cham. https://doi.org/10.1007/978-3-031-17140-6_22
Download citation
DOI: https://doi.org/10.1007/978-3-031-17140-6_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-17139-0
Online ISBN: 978-3-031-17140-6
eBook Packages: Computer ScienceComputer Science (R0)