research-article

Free Access

Cracking White-box DNN Watermarks via Invariant Neuron Transforms

Authors:
Xudong Pan

Fudan University, Shanghai, China

Fudan University, Shanghai, China

0000-0003-1394-0395
View Profile

,
Mi Zhang

Fudan University, Shanghai, China

Fudan University, Shanghai, China

0000-0003-3567-3478
View Profile

,
Yifan Yan

Fudan University, Shanghai, China

Fudan University, Shanghai, China

0000-0002-7409-9184
View Profile

,
Yining Wang

Fudan University, Shanghai, China

Fudan University, Shanghai, China

0009-0004-8660-5429
View Profile

,
Min Yang

Fudan University, Shanghai, China

Fudan University, Shanghai, China

0000-0001-9714-5545
View Profile

KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data MiningAugust 2023Pages 1783–1794https://doi.org/10.1145/3580305.3599291

Published:04 August 2023Publication History

KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 1783–1794

ABSTRACT

Recently, how to protect the Intellectual Property (IP) of deep neural networks (DNN) becomes a major concern for the AI industry. To combat potential model piracy, recent works explore various watermarking strategies to embed secret identity messages into the prediction behaviors or the internals (e.g., weights and neuron activation) of the target model. Sacrificing less functionality and involving more knowledge about the target model, the latter branch of watermarking schemes (i.e., white-box model watermarking) is claimed to be accurate, credible and secure against most known watermark removal attacks, with emerging research efforts and applications in the industry.

In this paper, we present the first effective removal attack which cracks almost all the existing white-box watermarking schemes with provably no performance overhead and no required prior knowledge. By analyzing these IP protection mechanisms at the granularity of neurons, we for the first time discover their common dependence on a set of fragile features of a local neuron group, all of which can be arbitrarily tampered by our proposed chain of invariant neuron transforms. On nine state-of-the-art white-box watermarking schemes and a broad set of industry-level DNN architectures, our attack for the first time reduces the embedded identity message in the protected models to be almost random. Meanwhile, unlike known removal attacks, our attack requires no prior knowledge on the training data distribution or the adopted watermark algorithms, and leaves model functionality intact.

Supplemental Material

kdd23-cracking-promotion.m4v

m4v

22.7 MB

Download

References

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770--778, 2016.Google ScholarCross Ref
Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.Google Scholar
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818--2826, 2016.Google ScholarCross Ref
Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. arXiv preprint arXiv:1605.07146, 2016.Google Scholar
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning, pages 2048--2057. PMLR, 2015.Google ScholarDigital Library
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre- training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.Google Scholar
Emma Strubell, Ananya Ganesh, and Andrew McCallum. Energy and policy considerations for deep learning in nlp. arXiv preprint arXiv:1906.02243, 2019.Google Scholar
Seong Joon Oh, Bernt Schiele, and Mario Fritz. Towards reverse-engineering black-box neural networks. In Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, pages 121--144. Springer, 2019.Google ScholarDigital Library
Tribhuvanesh Orekondy, Bernt Schiele, and Mario Fritz. Knockoff nets: Stealing functionality of black-box models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4954--4963, 2019.Google ScholarCross Ref
Binghui Wang and Neil Zhenqiang Gong. Stealing hyperparameters in machine learning. In 2018 IEEE Symposium on Security and Privacy (SP), pages 36--52. IEEE, 2018.Google ScholarCross Ref
Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. In 2017 IEEE symposium on security and privacy (SP), pages 3--18. IEEE, 2017.Google ScholarCross Ref
Ahmed Salem, Yang Zhang, Mathias Humbert, Pascal Berrang, Mario Fritz, and Michael Backes. Ml-leaks: Model and data independent membership inference attacks and defenses on machine learning models. arXiv preprint arXiv:1806.01246, 2018.Google Scholar
Yusuke Uchida, Yuki Nagai, Shigeyuki Sakazawa, and Shin'ichi Satoh. Embedding watermarks into deep neural networks. In Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, pages 269--277, 2017.Google ScholarDigital Library
Tianhao Wang and Florian Kerschbaum. Riga: Covert and robust white-box watermarking of deep neural networks. In Proceedings of the Web Conference 2021, pages 993--1004, 2021.Google ScholarDigital Library
Hanwen Liu, Zhenyu Weng, and Yuesheng Zhu. Watermarking deep neural networks with greedy residuals. In International Conference on Machine Learning, pages 6978--6988. PMLR, 2021.Google Scholar
Lixin Fan, Kam Woh Ng, Chee Seng Chan, and Qiang Yang. Deepip: Deep neural network intellectual property protection with passports. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.Google Scholar
Jie Zhang, Dongdong Chen, Jing Liao, Weiming Zhang, Gang Hua, and Nenghai Yu. Passport-aware normalization for deep model protection. Advances in Neural Information Processing Systems, 33:22619-22628, 2020.Google Scholar
Ding Sheng Ong, Chee Seng Chan, Kam Woh Ng, Lixin Fan, and Qiang Yang. Protecting intellectual property of generative adversarial networks from ambiguity attacks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3630--3639, 2021.Google ScholarCross Ref
Xuxi Chen, Tianlong Chen, Zhenyu Zhang, and Zhangyang Wang. You are caught stealing my winning lottery ticket! making a lottery ticket claim its ownership. Advances in Neural Information Processing Systems, 34, 2021.Google Scholar
Jian Han Lim, Chee Seng Chan, Kam Woh Ng, Lixin Fan, and Qiang Yang. Protect, show, attend and tell: Empowering image captioning models with ownership protection. Pattern Recognition, 122:108285, 2022.Google ScholarDigital Library
Bita Darvish Rouhani, Huili Chen, and Farinaz Koushanfar. Deepsigns: An end-to-end watermarking framework for ownership protection of deep neural networks. pages 485--497, 2019.Google Scholar
Yossi Adi, Carsten Baum, Moustapha Cisse, Benny Pinkas, and Joseph Keshet. Turning your weakness into a strength: Watermarking deep neural networks by backdooring. In 27th {USENIX} Security Symposium ({USENIX} Security 18), pages 1615--1631, 2018.Google Scholar
Jialong Zhang, Zhongshu Gu, Jiyong Jang, Hui Wu, Marc Ph Stoecklin, Heqing Huang, and Ian Molloy. Protecting intellectual property of deep neural networks with watermarking. In Proceedings of the 2018 on Asia Conference on Computer and Communications Security, pages 159--172, 2018.Google ScholarDigital Library
Sebastian Szyller, Buse Gul Atli, Samuel Marchal, and N Asokan. Dawn: Dynamic adversarial watermarking of neural networks. In Proceedings of the 29th ACM International Conference on Multimedia, pages 4417--4425, 2021.Google ScholarDigital Library
Hengrui Jia, Christopher A Choquette-Choo, Varun Chandrasekaran, and Nicolas Papernot. Entangled watermarks as a defense against model extraction. In 30th {USENIX} Security Symposium ({USENIX} Security 21), 2021.Google Scholar
Huili Chen, Cheng Fu, Bita Darvish Rouhani, Jishen Zhao, and Farinaz Koushanfar. Deepattest: an end-to-end attestation framework for deep neural networks. In 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA), pages 487--498. IEEE, 2019.Google ScholarDigital Library
Abigail See, Minh-Thang Luong, and Christopher D Manning. Compression of neural machine translation models via pruning. arXiv preprint arXiv:1606.09274, 2016.Google Scholar
Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710, 2016.Google Scholar
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.Google Scholar
Tianhao Wang and Florian Kerschbaum. Attacks on digital watermarks for deep neural networks. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2622--2626. IEEE, 2019.Google ScholarCross Ref
Ziqi Yang, Hung Dang, and Ee-Chien Chang. Effectiveness of distillation attack and countermeasure on neural network watermarking. arXiv preprint arXiv:1906.06046, 2019.Google Scholar
Masoumeh Shafieinejad, Nils Lukas, Jiaqi Wang, Xinda Li, and Florian Kerschbaum. On the robustness of backdoor-based watermarking in deep neural networks. In Proceedings of the 2021 ACM Workshop on Information Hiding and Multimedia Security, pages 177--188, 2021.Google ScholarDigital Library
Xinyun Chen, Wenxiao Wang, Chris Bender, Yiming Ding, Ruoxi Jia, Bo Li, and Dawn Song. Refit: a unified watermark removal framework for deep learning systems with limited data. In Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security, pages 321--335, 2021.Google ScholarDigital Library
William Aiken, Hyoungshick Kim, Simon Woo, and Jungwoo Ryoo. Neural network laundering: Removing black-box backdoor watermarks from deep neural networks. Computers & Security, 106:102277, 2021.Google ScholarDigital Library
Shangwei Guo, Tianwei Zhang, Han Qiu, Yi Zeng, Tao Xiang, and Yang Liu. Fine-tuning is not enough: A simple yet effective watermark removal attack for dnn models. In International Joint Conference on Artificial Intelligence (IJCAI), 2021.Google ScholarCross Ref
Bolun Wang, Yuanshun Yao, Shawn Shan, Huiying Li, Bimal Viswanath, Haitao Zheng, and Ben Y Zhao. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In 2019 IEEE Symposium on Security and Privacy (SP), pages 707--723. IEEE, 2019.Google ScholarCross Ref
An Mei Chen, Haw-minn Lu, and Robert Hecht-Nielsen. On the geometry of feedforward neural network error surfaces. Neural Computation, 5(6):910--927, 1993.Google ScholarDigital Library
Behnam Neyshabur, Ruslan Salakhutdinov, and Nathan Srebro. Path-sgd: Path- normalized optimization in deep neural networks. arXiv preprint arXiv:1506.02617, 2015.Google Scholar
Karan Ganju, Qi Wang, Wei Yang, Carl A Gunter, and Nikita Borisov. Property inference attacks on fully connected neural networks using permutation invariant representations. In Proceedings of the 2018 ACM SIGSAC conference on computer and communications security, pages 619--633, 2018.Google ScholarDigital Library
Phuong Bui Thi Mai and Christoph Lampert. Functional vs. parametric equivalence of relu networks. In 8th International Conference on Learning Representations, 2020.Google Scholar
Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pages 448--456. PMLR, 2015.Google ScholarDigital Library
Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.Google Scholar
Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. Instance normalization: The missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022, 2016.Google Scholar
Yuxin Wu and Kaiming He. Group normalization. In Proceedings of the European conference on computer vision (ECCV), pages 3--19, 2018.Google ScholarDigital Library
Richard W Hamming. Error detecting and error correcting codes. The Bell system technical journal, 29(2):147--160, 1950.Google Scholar
Florian Tramèr, Fan Zhang, Ari Juels, Michael K Reiter, and Thomas Ristenpart. Stealing machine learning models via prediction {APIs}. In 25th USENIX security symposium (USENIX Security 16), pages 601--618, 2016.Google Scholar
Honggang Yu, Kaichen Yang, Teng Zhang, Yun-Yun Tsai, Tsung-Yi Ho, and Yier Jin. Cloudleak: Large-scale deep learning models stealing through adversarial examples. In NDSS, 2020.Google ScholarCross Ref
Hoyong Jeong, Dohyun Ryu, and Junbeom Hur. Neural network stealing via meltdown. In 2021 International Conference on Information Networking (ICOIN), pages 36--38. IEEE, 2021.Google ScholarCross Ref
Mengjia Yan, Christopher W Fletcher, and Josep Torrellas. Cache telepathy: Leveraging shared resource attacks to learn {DNN} architectures. In 29th USENIX Security Symposium (USENIX Security 20), pages 2003--2020, 2020.Google Scholar
Nils Lukas, Edward Jiang, Xinda Li, and Florian Kerschbaum. Sok: How robust is deep neural network image classification watermarking? In 2022 IEEE Symposium on Security and Privacy (SP), 2022.Google ScholarCross Ref
Uchida. https://github.com/yu4u/dnn-watermark. Accessed: 2017-07-31.Google Scholar
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. Advances in neural information processing systems, 27, 2014.Google Scholar
Protection on the IPR of gans. https://github.com/dingsheng-ong/ipr-gan. Accessed: 2021-11-10.Google Scholar
Greedy residuals. https://github.com/eil/greedy-residuals. Accessed: 2021-7-16.Google Scholar
lottery verification. https://github.com/VITA-Group/NO-stealing-LTH. Accessed: 2021-10-1.Google Scholar
Deepsigns. https://github.com/bitadr/DeepSigns. Accessed: 2019-4-15.Google Scholar
Protection on the IPR of image captioning. https://github.com/jianhanlim/ipr-imagecaptioning. Accessed: 2021-10-15.Google Scholar
Deepipr. https://github.com/kamwoh/DeepIPR. Accessed: 2021-12-30.Google Scholar
Peizhuo Lv, Pan Li, Shengzhi Zhang, Kai Chen, Ruigang Liang, Yue Zhao, and Yingjiu Li. Hufunet: Embedding the left piece as watermark and keeping the right piece for ownership verification in deep neural networks. ArXiv, abs/2103.13628, 2021.Google Scholar
Yifan Yan, Xudong Pan, Yining Wang, Mi Zhang, and Min Yang. Cracking white-box DNN watermarks via invariant neuron transforms. CoRR, abs/2205.00199, 2022.Google Scholar
Passport-aware normalization. https://github.com/ZJZAC/Passport-aware-Normalization. Accessed: 2021-6-10.Google Scholar
Riga. https://github.com/TIANHAO-WANG/riga. Accessed: 2021-2-6.Google Scholar
Random permutation. https://numpy.org/devdocs/reference/random/generated/ numpy.random.permutation.html. Accessed: 2022-1-10.Google Scholar

Index Terms

Cracking White-box DNN Watermarks via Invariant Neuron Transforms
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Security and privacy
  1. Software and application security
    1. Domain-specific security and privacy architectures

Recommendations

The Multi-Watermarks Attack of DNN Watermarking
ICAIP '20: Proceedings of the 4th International Conference on Advances in Image Processing

Deep learning models are widely used in business scenarios and have achieved some success. It is usually time or computing consuming to build a production-level deep learning model. As a result, such models require copyright protection by watermarks. So ...
Read More
A robust watermarking technique for copyright protection using discrete wavelet transform

The arrival of digital world coming soon, the digital media content can be easily altered, duplicated, and spread, which causes the copyright of media are violated. Therefore, attention is to discuss the protection of the intellectual property (IP) ...
Read More
A video watermarking technique based on pseudo-3-D DCT and quantization index modulation

The increasing popularity of the internet means that digital multimedia are transmitted more rapidly and easily. And people are very aware for media ownership. However, digital watermarking is an efficient and promising means to protect intellectual ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
August 2023
5996 pages
ISBN:9798400701030
DOI:10.1145/3580305
General Chairs:
Ambuj Singh
UC Santa Barbara, USA
,
Yizhou Sun
UC Los Angeles, USA
,
Program Chairs:
Leman Akoglu
Carnegie Mellon University, USA
,
Dimitrios Gunopulos
University of Athens, Greece
,
Xifeng Yan
UC Santa Barbara, USA
,
Ravi Kumar
Google, USA
,
Fatma Ozcan
Google, USA
,
Jieping Ye
Alibaba DAMO Academy
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 August 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
deep learning
digital watermark
intellectual property
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 264
  Total Downloads
- Downloads (Last 12 months)264
- Downloads (Last 6 weeks)26
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Cracking White-box DNN Watermarks via Invariant Neuron Transforms

KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

The Multi-Watermarks Attack of DNN Watermarking

A robust watermarking technique for copyright protection using discrete wavelet transform

A video watermarking technique based on pseudo-3-D DCT and quantization index modulation