skip to main content
10.1145/3580305.3599291acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Free Access

Cracking White-box DNN Watermarks via Invariant Neuron Transforms

Published:04 August 2023Publication History

ABSTRACT

Recently, how to protect the Intellectual Property (IP) of deep neural networks (DNN) becomes a major concern for the AI industry. To combat potential model piracy, recent works explore various watermarking strategies to embed secret identity messages into the prediction behaviors or the internals (e.g., weights and neuron activation) of the target model. Sacrificing less functionality and involving more knowledge about the target model, the latter branch of watermarking schemes (i.e., white-box model watermarking) is claimed to be accurate, credible and secure against most known watermark removal attacks, with emerging research efforts and applications in the industry.

In this paper, we present the first effective removal attack which cracks almost all the existing white-box watermarking schemes with provably no performance overhead and no required prior knowledge. By analyzing these IP protection mechanisms at the granularity of neurons, we for the first time discover their common dependence on a set of fragile features of a local neuron group, all of which can be arbitrarily tampered by our proposed chain of invariant neuron transforms. On nine state-of-the-art white-box watermarking schemes and a broad set of industry-level DNN architectures, our attack for the first time reduces the embedded identity message in the protected models to be almost random. Meanwhile, unlike known removal attacks, our attack requires no prior knowledge on the training data distribution or the adopted watermark algorithms, and leaves model functionality intact.

Skip Supplemental Material Section

Supplemental Material

kdd23-cracking-promotion.m4v

m4v

22.7 MB

References

  1. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770--778, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  2. Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.Google ScholarGoogle Scholar
  3. Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818--2826, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  4. Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. arXiv preprint arXiv:1605.07146, 2016.Google ScholarGoogle Scholar
  5. Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning, pages 2048--2057. PMLR, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre- training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.Google ScholarGoogle Scholar
  7. Emma Strubell, Ananya Ganesh, and Andrew McCallum. Energy and policy considerations for deep learning in nlp. arXiv preprint arXiv:1906.02243, 2019.Google ScholarGoogle Scholar
  8. Seong Joon Oh, Bernt Schiele, and Mario Fritz. Towards reverse-engineering black-box neural networks. In Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, pages 121--144. Springer, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Tribhuvanesh Orekondy, Bernt Schiele, and Mario Fritz. Knockoff nets: Stealing functionality of black-box models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4954--4963, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  10. Binghui Wang and Neil Zhenqiang Gong. Stealing hyperparameters in machine learning. In 2018 IEEE Symposium on Security and Privacy (SP), pages 36--52. IEEE, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  11. Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. In 2017 IEEE symposium on security and privacy (SP), pages 3--18. IEEE, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  12. Ahmed Salem, Yang Zhang, Mathias Humbert, Pascal Berrang, Mario Fritz, and Michael Backes. Ml-leaks: Model and data independent membership inference attacks and defenses on machine learning models. arXiv preprint arXiv:1806.01246, 2018.Google ScholarGoogle Scholar
  13. Yusuke Uchida, Yuki Nagai, Shigeyuki Sakazawa, and Shin'ichi Satoh. Embedding watermarks into deep neural networks. In Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, pages 269--277, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Tianhao Wang and Florian Kerschbaum. Riga: Covert and robust white-box watermarking of deep neural networks. In Proceedings of the Web Conference 2021, pages 993--1004, 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Hanwen Liu, Zhenyu Weng, and Yuesheng Zhu. Watermarking deep neural networks with greedy residuals. In International Conference on Machine Learning, pages 6978--6988. PMLR, 2021.Google ScholarGoogle Scholar
  16. Lixin Fan, Kam Woh Ng, Chee Seng Chan, and Qiang Yang. Deepip: Deep neural network intellectual property protection with passports. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.Google ScholarGoogle Scholar
  17. Jie Zhang, Dongdong Chen, Jing Liao, Weiming Zhang, Gang Hua, and Nenghai Yu. Passport-aware normalization for deep model protection. Advances in Neural Information Processing Systems, 33:22619-22628, 2020.Google ScholarGoogle Scholar
  18. Ding Sheng Ong, Chee Seng Chan, Kam Woh Ng, Lixin Fan, and Qiang Yang. Protecting intellectual property of generative adversarial networks from ambiguity attacks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3630--3639, 2021.Google ScholarGoogle ScholarCross RefCross Ref
  19. Xuxi Chen, Tianlong Chen, Zhenyu Zhang, and Zhangyang Wang. You are caught stealing my winning lottery ticket! making a lottery ticket claim its ownership. Advances in Neural Information Processing Systems, 34, 2021.Google ScholarGoogle Scholar
  20. Jian Han Lim, Chee Seng Chan, Kam Woh Ng, Lixin Fan, and Qiang Yang. Protect, show, attend and tell: Empowering image captioning models with ownership protection. Pattern Recognition, 122:108285, 2022.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Bita Darvish Rouhani, Huili Chen, and Farinaz Koushanfar. Deepsigns: An end-to-end watermarking framework for ownership protection of deep neural networks. pages 485--497, 2019.Google ScholarGoogle Scholar
  22. Yossi Adi, Carsten Baum, Moustapha Cisse, Benny Pinkas, and Joseph Keshet. Turning your weakness into a strength: Watermarking deep neural networks by backdooring. In 27th {USENIX} Security Symposium ({USENIX} Security 18), pages 1615--1631, 2018.Google ScholarGoogle Scholar
  23. Jialong Zhang, Zhongshu Gu, Jiyong Jang, Hui Wu, Marc Ph Stoecklin, Heqing Huang, and Ian Molloy. Protecting intellectual property of deep neural networks with watermarking. In Proceedings of the 2018 on Asia Conference on Computer and Communications Security, pages 159--172, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Sebastian Szyller, Buse Gul Atli, Samuel Marchal, and N Asokan. Dawn: Dynamic adversarial watermarking of neural networks. In Proceedings of the 29th ACM International Conference on Multimedia, pages 4417--4425, 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Hengrui Jia, Christopher A Choquette-Choo, Varun Chandrasekaran, and Nicolas Papernot. Entangled watermarks as a defense against model extraction. In 30th {USENIX} Security Symposium ({USENIX} Security 21), 2021.Google ScholarGoogle Scholar
  26. Huili Chen, Cheng Fu, Bita Darvish Rouhani, Jishen Zhao, and Farinaz Koushanfar. Deepattest: an end-to-end attestation framework for deep neural networks. In 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA), pages 487--498. IEEE, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Abigail See, Minh-Thang Luong, and Christopher D Manning. Compression of neural machine translation models via pruning. arXiv preprint arXiv:1606.09274, 2016.Google ScholarGoogle Scholar
  28. Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710, 2016.Google ScholarGoogle Scholar
  29. Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.Google ScholarGoogle Scholar
  30. Tianhao Wang and Florian Kerschbaum. Attacks on digital watermarks for deep neural networks. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2622--2626. IEEE, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  31. Ziqi Yang, Hung Dang, and Ee-Chien Chang. Effectiveness of distillation attack and countermeasure on neural network watermarking. arXiv preprint arXiv:1906.06046, 2019.Google ScholarGoogle Scholar
  32. Masoumeh Shafieinejad, Nils Lukas, Jiaqi Wang, Xinda Li, and Florian Kerschbaum. On the robustness of backdoor-based watermarking in deep neural networks. In Proceedings of the 2021 ACM Workshop on Information Hiding and Multimedia Security, pages 177--188, 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Xinyun Chen, Wenxiao Wang, Chris Bender, Yiming Ding, Ruoxi Jia, Bo Li, and Dawn Song. Refit: a unified watermark removal framework for deep learning systems with limited data. In Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security, pages 321--335, 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. William Aiken, Hyoungshick Kim, Simon Woo, and Jungwoo Ryoo. Neural network laundering: Removing black-box backdoor watermarks from deep neural networks. Computers & Security, 106:102277, 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Shangwei Guo, Tianwei Zhang, Han Qiu, Yi Zeng, Tao Xiang, and Yang Liu. Fine-tuning is not enough: A simple yet effective watermark removal attack for dnn models. In International Joint Conference on Artificial Intelligence (IJCAI), 2021.Google ScholarGoogle ScholarCross RefCross Ref
  36. Bolun Wang, Yuanshun Yao, Shawn Shan, Huiying Li, Bimal Viswanath, Haitao Zheng, and Ben Y Zhao. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In 2019 IEEE Symposium on Security and Privacy (SP), pages 707--723. IEEE, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  37. An Mei Chen, Haw-minn Lu, and Robert Hecht-Nielsen. On the geometry of feedforward neural network error surfaces. Neural Computation, 5(6):910--927, 1993.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Behnam Neyshabur, Ruslan Salakhutdinov, and Nathan Srebro. Path-sgd: Path- normalized optimization in deep neural networks. arXiv preprint arXiv:1506.02617, 2015.Google ScholarGoogle Scholar
  39. Karan Ganju, Qi Wang, Wei Yang, Carl A Gunter, and Nikita Borisov. Property inference attacks on fully connected neural networks using permutation invariant representations. In Proceedings of the 2018 ACM SIGSAC conference on computer and communications security, pages 619--633, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Phuong Bui Thi Mai and Christoph Lampert. Functional vs. parametric equivalence of relu networks. In 8th International Conference on Learning Representations, 2020.Google ScholarGoogle Scholar
  41. Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pages 448--456. PMLR, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.Google ScholarGoogle Scholar
  43. Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. Instance normalization: The missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022, 2016.Google ScholarGoogle Scholar
  44. Yuxin Wu and Kaiming He. Group normalization. In Proceedings of the European conference on computer vision (ECCV), pages 3--19, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Richard W Hamming. Error detecting and error correcting codes. The Bell system technical journal, 29(2):147--160, 1950.Google ScholarGoogle Scholar
  46. Florian Tramèr, Fan Zhang, Ari Juels, Michael K Reiter, and Thomas Ristenpart. Stealing machine learning models via prediction {APIs}. In 25th USENIX security symposium (USENIX Security 16), pages 601--618, 2016.Google ScholarGoogle Scholar
  47. Honggang Yu, Kaichen Yang, Teng Zhang, Yun-Yun Tsai, Tsung-Yi Ho, and Yier Jin. Cloudleak: Large-scale deep learning models stealing through adversarial examples. In NDSS, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  48. Hoyong Jeong, Dohyun Ryu, and Junbeom Hur. Neural network stealing via meltdown. In 2021 International Conference on Information Networking (ICOIN), pages 36--38. IEEE, 2021.Google ScholarGoogle ScholarCross RefCross Ref
  49. Mengjia Yan, Christopher W Fletcher, and Josep Torrellas. Cache telepathy: Leveraging shared resource attacks to learn {DNN} architectures. In 29th USENIX Security Symposium (USENIX Security 20), pages 2003--2020, 2020.Google ScholarGoogle Scholar
  50. Nils Lukas, Edward Jiang, Xinda Li, and Florian Kerschbaum. Sok: How robust is deep neural network image classification watermarking? In 2022 IEEE Symposium on Security and Privacy (SP), 2022.Google ScholarGoogle ScholarCross RefCross Ref
  51. Uchida. https://github.com/yu4u/dnn-watermark. Accessed: 2017-07-31.Google ScholarGoogle Scholar
  52. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. Advances in neural information processing systems, 27, 2014.Google ScholarGoogle Scholar
  53. Protection on the IPR of gans. https://github.com/dingsheng-ong/ipr-gan. Accessed: 2021-11-10.Google ScholarGoogle Scholar
  54. Greedy residuals. https://github.com/eil/greedy-residuals. Accessed: 2021-7-16.Google ScholarGoogle Scholar
  55. lottery verification. https://github.com/VITA-Group/NO-stealing-LTH. Accessed: 2021-10-1.Google ScholarGoogle Scholar
  56. Deepsigns. https://github.com/bitadr/DeepSigns. Accessed: 2019-4-15.Google ScholarGoogle Scholar
  57. Protection on the IPR of image captioning. https://github.com/jianhanlim/ipr-imagecaptioning. Accessed: 2021-10-15.Google ScholarGoogle Scholar
  58. Deepipr. https://github.com/kamwoh/DeepIPR. Accessed: 2021-12-30.Google ScholarGoogle Scholar
  59. Peizhuo Lv, Pan Li, Shengzhi Zhang, Kai Chen, Ruigang Liang, Yue Zhao, and Yingjiu Li. Hufunet: Embedding the left piece as watermark and keeping the right piece for ownership verification in deep neural networks. ArXiv, abs/2103.13628, 2021.Google ScholarGoogle Scholar
  60. Yifan Yan, Xudong Pan, Yining Wang, Mi Zhang, and Min Yang. Cracking white-box DNN watermarks via invariant neuron transforms. CoRR, abs/2205.00199, 2022.Google ScholarGoogle Scholar
  61. Passport-aware normalization. https://github.com/ZJZAC/Passport-aware-Normalization. Accessed: 2021-6-10.Google ScholarGoogle Scholar
  62. Riga. https://github.com/TIANHAO-WANG/riga. Accessed: 2021-2-6.Google ScholarGoogle Scholar
  63. Random permutation. https://numpy.org/devdocs/reference/random/generated/ numpy.random.permutation.html. Accessed: 2022-1-10.Google ScholarGoogle Scholar

Index Terms

  1. Cracking White-box DNN Watermarks via Invariant Neuron Transforms

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
        August 2023
        5996 pages
        ISBN:9798400701030
        DOI:10.1145/3580305

        Copyright © 2023 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 4 August 2023

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader