Punctuation Matters! Stealthy Backdoor Attack for Language Models

Sheng, Xuan; Li, Zhicheng; Han, Zhaoyang; Chang, Xiangmao; Li, Piji

doi:10.1007/978-3-031-44693-1_41

Xuan Sheng¹¹,
Zhicheng Li¹¹,
Zhaoyang Han¹¹,
Xiangmao Chang¹¹ &
…
Piji Li¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14302))

Included in the following conference series:

CCF International Conference on Natural Language Processing and Chinese Computing

1219 Accesses

Abstract

Recent studies have pointed out that natural language processing (NLP) models are vulnerable to backdoor attacks. A backdoored model produces normal outputs on the clean samples while performing improperly on the texts with triggers that the adversary injects. However, previous studies on textual backdoor attack pay little attention to stealthiness. Moreover, some attack methods even cause grammatical issues or change the semantic meaning of the original texts. Therefore, they can easily be detected by humans or defense systems. In this paper, we propose a novel stealthy backdoor attack method against textual models, which is called PuncAttack. It leverages combinations of punctuation marks as the trigger and chooses proper locations strategically to replace them. Through extensive experiments, we demonstrate that the proposed method can effectively compromise multiple models in various tasks. Meanwhile, we conduct automatic evaluation and human inspection, which indicate the proposed method possesses good performance of stealthiness without bringing grammatical issues and altering the meaning of sentences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Azizi, A., et al.: T-miner: a generative approach to defend against trojan attacks on DNN-based text classification. In: USENIX (2021)
Google Scholar
Brown, T.B., et al.: Language models are few-shot learners. In: NeurIPS (2020)
Google Scholar
Carlini, N., et al.: Extracting training data from large language models. In: USENIX (2021)
Google Scholar
Dai, J., Chen, C., Li, Y.: A backdoor attack against LSTM-based text classification systems. IEEE Access 7, 138872–138878 (2019)
Article Google Scholar
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of AACL (2019)
Google Scholar
Gu, T., Dolan-Gavitt, B., Garg, S.: BadNets: identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733 (2017)
He, X., Lyu, L., Sun, L., Xu, Q.: Model extraction and adversarial transferability, your BERT is vulnerable! In: Proceedings of AACL (2021)
Google Scholar
Hill, R.L., Murray, W.S.: Commas and spaces: the point of punctuation. In: 11th Annual CUNY Conference on Human Sentence Processing (1998)
Google Scholar
Kurita, K., Michel, P., Neubig, G.: Weight poisoning attacks on pre-trained models. In: ACL (2020)
Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015)
Article Google Scholar
Li, L., Song, D., Li, X., Zeng, J., Ma, R., Qiu, X.: Backdoor attacks on pre-trained models by layerwise weight poisoning. In: EMNLP (2021)
Google Scholar
Li, S., et al.: Hidden backdoors in human-centric language models. In: CCS (2021)
Google Scholar
Li, Y., Jiang, Y., Li, Z., Xia, S.T.: Backdoor learning: a survey. TNNLS (2023)
Google Scholar
Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: ACL (2011)
Google Scholar
Qi, F., Chen, Y., Zhang, X., Li, M., Liu, Z., Sun, M.: Mind the style of text! Adversarial and backdoor attacks based on text style transfer. In: EMNLP (2021)
Google Scholar
Qi, F., et al.: Hidden killer: invisible textual backdoor attacks with syntactic trigger. In: ACL/IJCNLP (2021)
Google Scholar
Qi, F., Yao, Y., Xu, S., Liu, Z., Sun, M.: Turn the combination lock: learnable textual backdoor attacks via word substitution. In: ACL/IJCNLP (2021)
Google Scholar
Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: Squad: 100, 000+ questions for machine comprehension of text. In: EMNLP (2016)
Google Scholar
Shen, L., et al.: Backdoor pre-trained models can transfer to all. In: CCS (2021)
Google Scholar
Toner, A.: Seeing punctuation. Vis. Lang. 45, 1–2 (2011)
Google Scholar
Wallace, E., Zhao, T., Feng, S., Singh, S.: Concealed data poisoning attacks on NLP models. In: Proceedings of AACL (2021)
Google Scholar
Yang, W., Li, L., Zhang, Z., Ren, X., Sun, X., He, B.: Be careful about poisoned word embeddings: exploring the vulnerability of the embedding layers in NLP models. In: Proceedings of AACL (2021)
Google Scholar
Yang, W., Lin, Y., Li, P., Zhou, J., Sun, X.: Rethinking stealthiness of backdoor attack against NLP models. In: ACL/IJCNLP (2021)
Google Scholar
Zhang, T., Kishore, V., Wu, F., Weinberger, K.Q., Artzi, Y.: BERTScore: evaluating text generation with BERT. In: ICLR (2020)
Google Scholar
Zhang, W.E., Sheng, Q.Z., Alhazmi, A., Li, C.: Adversarial attacks on deep-learning models in natural language processing: a survey. TIST 11, 1–41 (2020)
Google Scholar
Zhang, X., Zhao, J.J., LeCun, Y.: Character-level convolutional networks for text classification. In: NeurIPS (2015)
Google Scholar
Zhang, X., Zhang, Z., Ji, S., Wang, T.: Trojaning language models for fun and profit. In: EuroSandP (2021)
Google Scholar
Zhang, Z., et al.: Red alarm for pre-trained models: universal vulnerability to neuron-level backdoor attacks. MIR 20, 180–193 (2021)
Google Scholar

Download references

Acknowledgements

This research is supported by the National Natural Science Foundation of China (No. 62106105), the CCF-Tencent Open Research Fund (No. RAGR20220122), the CCF-Zhipu AI Large Model Fund (No. CCF-Zhipu202315), the Scientific Research Starting Foundation of Nanjing University of Aeronautics and Astronautics (No. YQR21022), and the High Performance Computing Platform of Nanjing University of Aeronautics and Astronautics.

Author information

Authors and Affiliations

Nanjing University of Aeronautics and Astronautics, Nanjing, China
Xuan Sheng, Zhicheng Li, Zhaoyang Han, Xiangmao Chang & Piji Li

Authors

Xuan Sheng
View author publications
You can also search for this author in PubMed Google Scholar
Zhicheng Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhaoyang Han
View author publications
You can also search for this author in PubMed Google Scholar
Xiangmao Chang
View author publications
You can also search for this author in PubMed Google Scholar
Piji Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Piji Li .

Editor information

Editors and Affiliations

Emory University, Atlanta, GA, USA
Fei Liu
Microsoft Research Asia, Beijing, China
Nan Duan
Soochow University, Suzhou, China
Qingting Xu
Soochow University, Suzhou, China
Yu Hong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sheng, X., Li, Z., Han, Z., Chang, X., Li, P. (2023). Punctuation Matters! Stealthy Backdoor Attack for Language Models. In: Liu, F., Duan, N., Xu, Q., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2023. Lecture Notes in Computer Science(), vol 14302. Springer, Cham. https://doi.org/10.1007/978-3-031-44693-1_41

Download citation

DOI: https://doi.org/10.1007/978-3-031-44693-1_41
Published: 08 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44692-4
Online ISBN: 978-3-031-44693-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)

Punctuation Matters! Stealthy Backdoor Attack for Language Models