BadRL: Sparse Targeted Backdoor Attack Against Reinforcement Learning

Cui, Jing; Han, Yufei; Ma, Yuzhe; Jiao, Jianbin; Zhang, Junge

Computer Science > Machine Learning

arXiv:2312.12585 (cs)

[Submitted on 19 Dec 2023]

Title:BadRL: Sparse Targeted Backdoor Attack Against Reinforcement Learning

Authors:Jing Cui, Yufei Han, Yuzhe Ma, Jianbin Jiao, Junge Zhang

View PDF HTML (experimental)

Abstract:Backdoor attacks in reinforcement learning (RL) have previously employed intense attack strategies to ensure attack success. However, these methods suffer from high attack costs and increased detectability. In this work, we propose a novel approach, BadRL, which focuses on conducting highly sparse backdoor poisoning efforts during training and testing while maintaining successful attacks. Our algorithm, BadRL, strategically chooses state observations with high attack values to inject triggers during training and testing, thereby reducing the chances of detection. In contrast to the previous methods that utilize sample-agnostic trigger patterns, BadRL dynamically generates distinct trigger patterns based on targeted state observations, thereby enhancing its effectiveness. Theoretical analysis shows that the targeted backdoor attack is always viable and remains stealthy under specific assumptions. Empirical results on various classic RL tasks illustrate that BadRL can substantially degrade the performance of a victim agent with minimal poisoning efforts 0.003% of total training steps) during training and infrequent attacks during testing.

Comments:	Extended version of the submission accepted by AAAI 2024. It is revised by integrating review comments
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Cite as:	arXiv:2312.12585 [cs.LG]
	(or arXiv:2312.12585v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2312.12585

Submission history

From: Yufei Han [view email]
[v1] Tue, 19 Dec 2023 20:29:29 UTC (2,441 KB)

Computer Science > Machine Learning

Title:BadRL: Sparse Targeted Backdoor Attack Against Reinforcement Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:BadRL: Sparse Targeted Backdoor Attack Against Reinforcement Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators