ABSTRACT
The recent success of pre-trained language models (PLMs) such as BERT has resulted in the development of various beneficial database middlewares, including natural language query interfaces and entity matching. This shift has been greatly facilitated by the extensive external knowledge of PLMs. However, as PLMs are often provided by untrusted third parties, their lack of standardization and regulation poses significant security risks that have yet to be fully explored. This paper investigates the security threats posed by malicious PLMs to these emerging database middleware. We specifically propose a novel type of Trojan attack, where a maliciously designed PLM causes unexpected behavior in the database middleware. These Trojan attacks possess the following characteristics: (1) Triggerability: The Trojan-infected database middleware will function normally with normal input, but will likely malfunction when triggered by the attacker. (2) Imperceptibility: There is no need for noticeable modification of the input to trigger the Trojan. (3) Generalizability: The Trojan is capable of targeting a variety of downstream tasks, not just one specific task. We thoroughly evaluate the impact of these Trojan attacks through experiments and analyze potential countermeasures and their limitations. Our findings could aid in the creation of stronger mechanisms for the implementation of PLMs in database middleware.
Supplemental Material
- Battista Biggio, Blaine Nelson, and Pavel Laskov. 2012. Poisoning attacks against support vector machines. In Proceedings of the International Conference on Machine Learning (ICML).Google Scholar
- Nicholas Boucher, Ilia Shumailov, Ross Anderson, and Nicolas Papernot. 2022. Bad Characters: Imperceptible NLP Attacks. In Proceedings of IEEE Symposium on Security and Privacy (S&P).Google ScholarCross Ref
- Ursin Brunner and Kurt Stockinger. 2020. Entity Matching with Transformer Architectures - A Step Forward in Data Integration. In Proceedings of the 23rd International Conference on Extending Database Technology, EDBT. 463--473.Google Scholar
- Kangjie Chen, Yuxian Meng, Xiaofei Sun, Shangwei Guo, Tianwei Zhang, Jiwei Li, and Chun Fan. 2022. Badpre: Task-agnostic backdoor attacks to pre-trained nlp foundation models. In Proceedings of the International Conference on Learning Representations (ICLR).Google Scholar
- Valter Crescenzi, Andrea De Angelis, Donatella Firmani, Maurizio Mazzei, Paolo Merialdo, Federico Piai, and Divesh Srivastava. 2021. Alaska: A flexible benchmark for data integration tasks. arXiv preprint arXiv:2101.11259 (2021).Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google Scholar
- Siddhant Garg, Adarsh Kumar, Vibhor Goel, and Yingyu Liang. 2020. Can adversarial weight perturbations inject neural backdoors. In Proceedings of the ACM International Conference on Information & Knowledge Management (CIKM).Google ScholarDigital Library
- Tong Guo and Huilin Gao. 2019. Content enhanced bert-based text-to-sql generation. arXiv preprint arXiv:1910.07179 (2019).Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
- Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno, and Julian Martin Eisenschlos. 2020. TaPas: Weakly supervised table parsing via pre-training. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL).Google ScholarCross Ref
- Jeremy Howard and Sebastian Ruder. 2018. Universal Language Model Fine-tuning for Text Classification. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL).Google ScholarCross Ref
- Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Wortsman, Suchin Gururangan, Ludwig Schmidt, Hannaneh Hajishirzi, and Ali Farhadi. 2022. Editing Models with Task Arithmetic. arXiv preprint arXiv:2212.04089 (2022).Google Scholar
- Hengrui Jia, Mohammad Yaghini, Christopher A. Choquette-Choo, Natalie Dullerud, Anvith Thudi, Varun Chandrasekaran, and Nicolas Papernot. 2021. Proof-of-Learning: Definitions and Practice. In Proceedings of the IEEE Symposium on Security and Privacy (S&P).Google ScholarCross Ref
- Nal Kalchbrenner and Phil Blunsom. 2013. Recurrent continuous translation models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP).Google Scholar
- Georgios Karagiannis, Mohammed Saeed, Paolo Papotti, and Immanuel Trummer. 2020. Scrutinizer: a mixed-initiative approach to large-scale, data-driven claim verification. In Proceedings of the VLDB Endowment.Google ScholarDigital Library
- Keita Kurita, Paul Michel, and Graham Neubig. 2020. Weight Poisoning Attacks on Pretrained Models. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL).Google ScholarCross Ref
- Linyang Li, Demin Song, Xiaonan Li, Jiehang Zeng, Ruotian Ma, and Xipeng Qiu. 2021b. Backdoor Attacks on Pre-trained Models by Layerwise Weight Poisoning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP).Google ScholarCross Ref
- Shaofeng Li, Hui Liu, Tian Dong, Benjamin Zi Hao Zhao, Minhui Xue, Haojin Zhu, and Jialiang Lu. 2021a. Hidden Backdoors in Human-Centric Language Models. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS).Google ScholarDigital Library
- Yuliang Li, Jinfeng Li, Yoshihiko Suhara, AnHai Doan, and Wang-Chiew Tan. 2020. Deep Entity Matching with Pre-Trained Language Models. Proc. VLDB Endow., Vol. 14, 1 (2020), 50--60.Google ScholarDigital Library
- Xi Victoria Lin, Richard Socher, and Caiming Xiong. 2020. Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic Parsing. In Findings of the Association for Computational Linguistics.Google Scholar
- Qian Liu, Bei Chen, Jiaqi Guo, Zeqi Lin, and Jian-guang Lou. 2022. TAPEX: table pre-training via learning a neural SQL executor. In Proceedings of the International Conference on Learning Representations (ICLR).Google Scholar
- Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).Google Scholar
- Weimin Lyu, Songzhu Zheng, Teng Ma, and Chao Chen. 2022. A Study of the Attention Abnormality in Trojaned BERTs. ArXiv, Vol. abs/2205.08305 (2022).Google Scholar
- Sidharth Mudgal, Han Li, Theodoros Rekatsinas, AnHai Doan, Youngchoon Park, Ganesh Krishnan, Rohit Deep, Esteban Arcaute, and Vijay Raghavendra. 2018. Deep learning for entity matching: A design space exploration. In Proceedings of the ACM SIGMOD International Conference on Management of Data.Google ScholarDigital Library
- Xudong Pan, Mi Zhang, Beina Sheng, Jiaming Zhu, and Min Yang. 2022. Hidden Trigger Backdoor Attack on NLP Models via Linguistic Style Manipulation. In 31st USENIX Security Symposium (USENIX Security 22). USENIX Association, Boston, MA, 3611--3628. https://www.usenix.org/conference/usenixsecurity22/presentation/pan-hiddenGoogle Scholar
- Panupong Pasupat and Percy Liang. 2015. Compositional Semantic Parsing on Semi-Structured Tables. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL).Google ScholarCross Ref
- Ralph Peeters and Christian Bizer. 2021. Dual-objective fine-tuning of BERT for entity matching. In Proceedings of the VLDB Endowment.Google ScholarDigital Library
- Anna Primpeli, Ralph Peeters, and Christian Bizer. 2019. The WDC training dataset and gold standard for large-scale product matching. In Proceedings of the World Wide Web Conference (WWW).Google ScholarDigital Library
- Fanchao Qi, Yangyi Chen, Mukai Li, Yuan Yao, Zhiyuan Liu, and Maosong Sun. 2021a. Onion: A simple and effective defense against textual backdoor attacks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP).Google ScholarCross Ref
- Fanchao Qi, Mukai Li, Yangyi Chen, Zhengyan Zhang, Zhiyuan Liu, Yasheng Wang, and Maosong Sun. 2021b. Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL).Google ScholarCross Ref
- Fanchao Qi, Yuan Yao, Sophia Xu, Zhiyuan Liu, and Maosong Sun. 2021c. Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word Substitution. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL).Google ScholarCross Ref
- Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. (2018).Google Scholar
- Torsten Scholak, Nathan Schucher, and Dzmitry Bahdanau. 2021. PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP).Google ScholarCross Ref
- Immanuel Trummer. 2021. DB-BERT: a Database Tuning Tool that "Reads the Manual". In Proceedings of ACM SIGMOD/PODS Conference.Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS).Google Scholar
- Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In International Conference on Learning Representations. https://openreview.net/forum?id=rJ4km2R5t7Google Scholar
- Lihan Wang, Bowen Qin, Binyuan Hui, Bowen Li, Min Yang, Bailin Wang, Binhua Li, Fei Huang, Luo Si, and Yongbin Li. 2022. Proton: Probing Schema Linking Information from Pre-trained Language Models for Text-to-SQL Parsing. arXiv preprint arXiv:2206.14017 (2022).Google Scholar
- Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, et al. 2020. Transformers: State-of-the-art natural language processing. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP).Google ScholarCross Ref
- Tianbao Xie, Chen Henry Wu, Peng Shi, Ruiqi Zhong, Torsten Scholak, Michihiro Yasunaga, Chien-Sheng Wu, Ming Zhong, Pengcheng Yin, Sida I Wang, et al. 2022. UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models. arXiv preprint arXiv:2201.05966 (2022).Google Scholar
- Chaofei Yang, Qing Wu, Hai Li, and Yiran Chen. 2017. Generative poisoning attack method against neural networks. arXiv preprint arXiv:1703.01340 (2017).Google Scholar
- Wenkai Yang, Lei Li, Zhiyuan Zhang, Xuancheng Ren, Xu Sun, and Bin He. 2021. Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL).Google ScholarCross Ref
- Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, et al. 2018. Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP).Google ScholarCross Ref
- Xinyang Zhang, Zheng Zhang, Shouling Ji, and Ting Wang. 2021a. Trojaning language models for fun and profit. In Proceedings of IEEE European Symposium on Security and Privacy (EuroS&P).Google ScholarCross Ref
- Xinyang Zhang, Zheng Zhang, Shouling Ji, and Ting Wang. 2021b. Trojaning language models for fun and profit. In Proceedings of IEEE European Symposium on Security and Privacy (EuroS&P).Google ScholarCross Ref
- Zhiyuan Zhang, Lingjuan Lyu, Xingjun Ma, Chenguang Wang, and Xu Sun. 2022. Fine-mixing: Mitigating Backdoors in Fine-tuned Language Models. arXiv preprint arXiv:2210.09545 (2022).Google Scholar
- Victor Zhong, Caiming Xiong, and Richard Socher. 2017. Seq2sql: Generating structured queries from natural language using reinforcement learning. arXiv preprint arXiv:1709.00103 (2017).Google Scholar
- Yukun Zhu, Ryan Kiros, Rich Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).Google ScholarDigital Library
Index Terms
- Investigating Trojan Attacks on Pre-trained Language Model-powered Database Middleware
Recommendations
An Embarrassingly Simple Approach for Trojan Attack in Deep Neural Networks
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningWith the widespread use of deep neural networks (DNNs) in high-stake applications, the security problem of the DNN models has received extensive attention. In this paper, we investigate a specific security problem called trojan attack, which aims to ...
Practical Detection of Trojan Neural Networks: Data-Limited and Data-Free Cases
Computer Vision – ECCV 2020AbstractWhen the training data are maliciously tampered, the predictions of the acquired deep neural network (DNN) can be manipulated by an adversary known as the Trojan attack (or poisoning backdoor attack). The lack of robustness of DNNs against Trojan ...
STRIP: a defence against trojan attacks on deep neural networks
ACSAC '19: Proceedings of the 35th Annual Computer Security Applications ConferenceA recent trojan attack on deep neural network (DNN) models is one insidious variant of data poisoning attacks. Trojan attacks exploit an effective backdoor created in a DNN model by leveraging the difficulty in interpretability of the learned model to ...
Comments