skip to main content
10.1145/3534678.3539320acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

FreeKD: Free-direction Knowledge Distillation for Graph Neural Networks

Authors Info & Claims
Published:14 August 2022Publication History

ABSTRACT

Knowledge distillation (KD) has demonstrated its effectiveness to boost the performance of graph neural networks (GNNs), where its goal is to distill knowledge from a deeper teacher GNN into a shallower student GNN. However, it is actually difficult to train a satisfactory teacher GNN due to the well-known over-parametrized and over-smoothing issues, leading to invalid knowledge transfer in practical applications. In this paper, we propose the first Free-direction Knowledge Distillation framework via Reinforcement learning for GNNs, called FreeKD, which is no longer required to provide a deeper well-optimized teacher GNN. The core idea of our work is to collaboratively build two shallower GNNs in an effort to exchange knowledge between them via reinforcement learning in a hierarchical way. As we observe that one typical GNN model often has better and worse performances at different nodes during training, we devise a dynamic and free-direction knowledge transfer strategy that consists of two levels of actions: 1) node-level action determines the directions of knowledge transfer between the corresponding nodes of two networks; and then 2) structure-level action determines which of the local structures generated by the node-level actions to be propagated. In essence, our FreeKD is a general and principled framework which can be naturally compatible with GNNs of different architectures. Extensive experiments on five benchmark datasets demonstrate our FreeKD outperforms two base GNNs in a large margin, and shows its efficacy to various GNNs. More surprisingly, our FreeKD has comparable or even better performance than traditional KD algorithms that distill knowledge from a deeper and stronger teacher GNN.

Skip Supplemental Material Section

Supplemental Material

KDD22-rtfp0995.mp4

mp4

14 MB

References

  1. 2019. Improving multi-task deep neural networks via knowledge distillation for natural language understanding. arXiv (2019).Google ScholarGoogle Scholar
  2. Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage, and Anil Anthony Bharath. 2017. Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine 34, 6 (2017), 26--38.Google ScholarGoogle ScholarCross RefCross Ref
  3. Cristian Bucilua, Rich Caruana, and Alexandru Niculescu-Mizil. 2006. Model - compression. In KDD. 535--541.Google ScholarGoogle Scholar
  4. Deli Chen, Yankai Lin, Guangxiang Zhao, Xuancheng Ren, Peng Li, Jie Zhou, and Xu Sun. 2021. Topology-Imbalance Learning for Semi-Supervised Node Classification. NeurIPS (2021).Google ScholarGoogle Scholar
  5. Guobin Chen, Wongun Choi, Xiang Yu, Tony Han, and Manmohan Chandraker. 2017. Learning efficient object detection models with knowledge distillation. NeurIPS (2017).Google ScholarGoogle Scholar
  6. Jie Chen, Tengfei Ma, and Cao Xiao. 2018. Fastgcn: fast learning with graph convolutional networks via importance sampling. (2018).Google ScholarGoogle Scholar
  7. Ming Chen, Zhewei Wei, Zengfeng Huang, Bolin Ding, and Yaliang Li. 2020. Simple and deep graph convolutional networks. In ICML. 1725--1735.Google ScholarGoogle Scholar
  8. Tianshui Chen, Zhouxia Wang, Guanbin Li, and Liang Lin. 2018. Recurrent attentional reinforcement learning for multi-label image recognition. In AAAI.Google ScholarGoogle Scholar
  9. Yuzhao Chen, Yatao Bian, Xi Xiao, Yu Rong, Tingyang Xu, and Junzhou Huang. 2021. On self-distilling graph neural network. In IJCAI. 2278--2284.Google ScholarGoogle Scholar
  10. Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, Samy Bengio, and Cho-Jui Hsieh. 2019. Cluster-gcn: An efficient algorithm for training deep and large graph convolutional networks. In KDD. 257--266.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Xiang Deng and Zhongfei Zhang. 2021. Graph-Free Knowledge Distillation for Graph Neural Networks. (2021).Google ScholarGoogle Scholar
  12. Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In ICML. 1861--1870.Google ScholarGoogle Scholar
  13. William L Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In NeurIPS. 1025--1035.Google ScholarGoogle Scholar
  14. Geoffrey Hinton, Oriol Vinyals, Jeff Dean, et al. 2015. Distilling the knowledge in a neural network. arXiv 2, 7 (2015).Google ScholarGoogle Scholar
  15. Wenbing Huang, Tong Zhang, Yu Rong, and Junzhou Huang. 2018. Adaptive sampling towards fast graph representation learning. (2018).Google ScholarGoogle Scholar
  16. Yoon Kim and Alexander M Rush. 2016. Sequence-level knowledge distillation. arXiv (2016).Google ScholarGoogle Scholar
  17. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. (2014).Google ScholarGoogle Scholar
  18. Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv (2016).Google ScholarGoogle Scholar
  19. Johannes Klicpera, Aleksandar Bojchevski, and Stephan Günnemann. 2018. Predict then propagate: Graph neural networks meet personalized pagerank. arXiv (2018).Google ScholarGoogle Scholar
  20. Kwei-Herng Lai, Daochen Zha, Kaixiong Zhou, and Xia Hu. 2020. Policy-gnn: Aggregation optimization for graph neural networks. In KDD. 461--471.Google ScholarGoogle Scholar
  21. Shining Liang, Ming Gong, Jian Pei, Linjun Shou, Wanli Zuo, Xianglin Zuo, and Daxin Jiang. 2021. Reinforced Iterative Knowledge Distillation for Cross-Lingual Named Entity Recognition. In KDD. 3231--3239.Google ScholarGoogle Scholar
  22. Yaoyao Liu, Bernt Schiele, and Qianru Sun. 2021. RMM: Reinforced Memory Management for Class-Incremental Learning. NeurIPS (2021).Google ScholarGoogle Scholar
  23. Yongcheng Liu, Lu Sheng, Jing Shao, Junjie Yan, Shiming Xiang, and Chunhong Pan. 2018. Multi-label image classification via knowledge distillation from weaklysupervised detection. In MM. 700--708.Google ScholarGoogle Scholar
  24. Seyed Iman Mirzadeh, Mehrdad Farajtabar, Ang Li, Nir Levine, Akihiro Matsukawa, and Hassan Ghasemzadeh. 2020. Improved knowledge distillation via teacher assistant. In AAAI. 5191--5198.Google ScholarGoogle Scholar
  25. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. nature 518, 7540 (2015), 529--533.Google ScholarGoogle Scholar
  26. Min-hwan Oh and Garud Iyengar. 2019. Sequential anomaly detection using inverse reinforcement learning. In KDD. 1480--1490.Google ScholarGoogle Scholar
  27. Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank citation ranking: Bringing order to the web. Technical Report. Stanford InfoLab.Google ScholarGoogle Scholar
  28. Edwin Pednault, Naoki Abe, and Bianca Zadrozny. 2002. Sequential cost-sensitive decision making with reinforcement learning. In KDD. 259--268.Google ScholarGoogle Scholar
  29. Hongbin Pei, Bingzhe Wei, Kevin Chen-Chuan Chang, Yu Lei, and Bo Yang. 2020. Geom-gcn: Geometric graph convolutional networks. (2020).Google ScholarGoogle Scholar
  30. Benedek Rozemberczki, Carl Allen, and Rik Sarkar. 2021. Multi-scale attributed node embedding. Journal of Complex Networks 9, 2 (2021), cnab014.Google ScholarGoogle ScholarCross RefCross Ref
  31. Satu Elisa Schaeffer. 2007. Graph clustering. Computer Science Review 1, 1 (2007), 27--64.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina Eliassi-Rad. 2008. Collective classification in network data. AI Magazine 29, 3 (2008), 93--93.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. arXiv preprint arXiv:1710.10903 (2017).Google ScholarGoogle Scholar
  34. Bo Wang, Minghui Qiu, Xisen Wang, Yaliang Li, Yu Gong, Xiaoyi Zeng, Jun Huang, Bo Zheng, Deng Cai, and Jingren Zhou. 2019. A minimax game for instance based selective transfer learning. In KDD. 34--43.Google ScholarGoogle Scholar
  35. Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8, 3 (1992), 229--256.Google ScholarGoogle Scholar
  36. Felix Wu, Amauri Souza, Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Weinberger. 2019. Simplifying graph convolutional networks. In ICML. 6861-- 6871.Google ScholarGoogle Scholar
  37. Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. 2020. A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems 32, 1 (2020), 4--24.Google ScholarGoogle ScholarCross RefCross Ref
  38. Yiqing Xie, Sha Li, Carl Yang, Raymond Chi Wing Wong, and Jiawei Han. 2020. When do gnns work: Understanding and improving neighborhood aggregation. In IJCAI.Google ScholarGoogle Scholar
  39. Bencheng Yan, Chaokun Wang, Gaoyang Guo, and Yunkai Lou. 2020. TinyGNN: Learning Efficient Graph Neural Networks. In KDD. 1848--1856.Google ScholarGoogle Scholar
  40. Cheng Yang, Jiawei Liu, and Chuan Shi. 2021. Extract the Knowledge of Graph Neural Networks and Go Beyond it: An Effective Knowledge Distillation Framework. In WWW. 1227--1237.Google ScholarGoogle Scholar
  41. Yiding Yang, Jiayan Qiu, Mingli Song, Dacheng Tao, and Xinchao Wang. 2020. Distilling knowledge from graph convolutional networks. In CVPR. 7074--7083.Google ScholarGoogle Scholar
  42. Fei Yuan, Linjun Shou, Jian Pei, Wutao Lin, Ming Gong, Yan Fu, and Daxin Jiang. 2021. Reinforced multi-teacher selection for knowledge distillation. In AAAI, Vol. 35. 14284--14291.Google ScholarGoogle ScholarCross RefCross Ref
  43. Wentao Zhang, Yuezihan Jiang, Yang Li, Zeang Sheng, Yu Shen, Xupeng Miao, Liang Wang, Zhi Yang, and Bin Cui. 2021. ROD: reception-aware online distillation for sparse graphs. In KDD. 2232--2242.Google ScholarGoogle Scholar
  44. Wentao Zhang, Xupeng Miao, Yingxia Shao, Jiawei Jiang, Lei Chen, Olivier Ruas, and Bin Cui. 2020. Reliable data distillation on graph convolutional network. In SIGMOD. 1399--1414.Google ScholarGoogle Scholar
  45. Guanjie Zheng, Fuzheng Zhang, Zihan Zheng, Yang Xiang, Nicholas Jing Yuan, Xing Xie, and Zhenhui Li. 2018. DRN: A deep reinforcement learning framework for news recommendation. In WWW. 167--176.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. FreeKD: Free-direction Knowledge Distillation for Graph Neural Networks

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
        August 2022
        5033 pages
        ISBN:9781450393850
        DOI:10.1145/3534678

        Copyright © 2022 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 14 August 2022

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader