Abstract
Recent studies have pointed out that natural language processing (NLP) models are vulnerable to backdoor attacks. A backdoored model produces normal outputs on the clean samples while performing improperly on the texts with triggers that the adversary injects. However, previous studies on textual backdoor attack pay little attention to stealthiness. Moreover, some attack methods even cause grammatical issues or change the semantic meaning of the original texts. Therefore, they can easily be detected by humans or defense systems. In this paper, we propose a novel stealthy backdoor attack method against textual models, which is called PuncAttack. It leverages combinations of punctuation marks as the trigger and chooses proper locations strategically to replace them. Through extensive experiments, we demonstrate that the proposed method can effectively compromise multiple models in various tasks. Meanwhile, we conduct automatic evaluation and human inspection, which indicate the proposed method possesses good performance of stealthiness without bringing grammatical issues and altering the meaning of sentences.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Azizi, A., et al.: T-miner: a generative approach to defend against trojan attacks on DNN-based text classification. In: USENIX (2021)
Brown, T.B., et al.: Language models are few-shot learners. In: NeurIPS (2020)
Carlini, N., et al.: Extracting training data from large language models. In: USENIX (2021)
Dai, J., Chen, C., Li, Y.: A backdoor attack against LSTM-based text classification systems. IEEE Access 7, 138872ā138878 (2019)
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of AACL (2019)
Gu, T., Dolan-Gavitt, B., Garg, S.: BadNets: identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733 (2017)
He, X., Lyu, L., Sun, L., Xu, Q.: Model extraction and adversarial transferability, your BERT is vulnerable! In: Proceedings of AACL (2021)
Hill, R.L., Murray, W.S.: Commas and spaces: the point of punctuation. In: 11th Annual CUNY Conference on Human Sentence Processing (1998)
Kurita, K., Michel, P., Neubig, G.: Weight poisoning attacks on pre-trained models. In: ACL (2020)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436ā444 (2015)
Li, L., Song, D., Li, X., Zeng, J., Ma, R., Qiu, X.: Backdoor attacks on pre-trained models by layerwise weight poisoning. In: EMNLP (2021)
Li, S., et al.: Hidden backdoors in human-centric language models. In: CCS (2021)
Li, Y., Jiang, Y., Li, Z., Xia, S.T.: Backdoor learning: a survey. TNNLS (2023)
Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: ACL (2011)
Qi, F., Chen, Y., Zhang, X., Li, M., Liu, Z., Sun, M.: Mind the style of text! Adversarial and backdoor attacks based on text style transfer. In: EMNLP (2021)
Qi, F., et al.: Hidden killer: invisible textual backdoor attacks with syntactic trigger. In: ACL/IJCNLP (2021)
Qi, F., Yao, Y., Xu, S., Liu, Z., Sun, M.: Turn the combination lock: learnable textual backdoor attacks via word substitution. In: ACL/IJCNLP (2021)
Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: Squad: 100, 000+ questions for machine comprehension of text. In: EMNLP (2016)
Shen, L., et al.: Backdoor pre-trained models can transfer to all. In: CCS (2021)
Toner, A.: Seeing punctuation. Vis. Lang. 45, 1ā2 (2011)
Wallace, E., Zhao, T., Feng, S., Singh, S.: Concealed data poisoning attacks on NLP models. In: Proceedings of AACL (2021)
Yang, W., Li, L., Zhang, Z., Ren, X., Sun, X., He, B.: Be careful about poisoned word embeddings: exploring the vulnerability of the embedding layers in NLP models. In: Proceedings of AACL (2021)
Yang, W., Lin, Y., Li, P., Zhou, J., Sun, X.: Rethinking stealthiness of backdoor attack against NLP models. In: ACL/IJCNLP (2021)
Zhang, T., Kishore, V., Wu, F., Weinberger, K.Q., Artzi, Y.: BERTScore: evaluating text generation with BERT. In: ICLR (2020)
Zhang, W.E., Sheng, Q.Z., Alhazmi, A., Li, C.: Adversarial attacks on deep-learning models in natural language processing: a survey. TIST 11, 1ā41 (2020)
Zhang, X., Zhao, J.J., LeCun, Y.: Character-level convolutional networks for text classification. In: NeurIPS (2015)
Zhang, X., Zhang, Z., Ji, S., Wang, T.: Trojaning language models for fun and profit. In: EuroSandP (2021)
Zhang, Z., et al.: Red alarm for pre-trained models: universal vulnerability to neuron-level backdoor attacks. MIR 20, 180ā193 (2021)
Acknowledgements
This research is supported by the National Natural Science Foundation of China (No. 62106105), the CCF-Tencent Open Research Fund (No. RAGR20220122), the CCF-Zhipu AI Large Model Fund (No. CCF-Zhipu202315), the Scientific Research Starting Foundation of Nanjing University of Aeronautics and Astronautics (No. YQR21022), and the High Performance Computing Platform of Nanjing University of Aeronautics and Astronautics.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sheng, X., Li, Z., Han, Z., Chang, X., Li, P. (2023). Punctuation Matters! Stealthy Backdoor Attack forĀ Language Models. In: Liu, F., Duan, N., Xu, Q., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2023. Lecture Notes in Computer Science(), vol 14302. Springer, Cham. https://doi.org/10.1007/978-3-031-44693-1_41
Download citation
DOI: https://doi.org/10.1007/978-3-031-44693-1_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44692-4
Online ISBN: 978-3-031-44693-1
eBook Packages: Computer ScienceComputer Science (R0)