skip to main content
10.1145/3589334.3645398acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article
Free Access

Memory Disagreement: A Pseudo-Labeling Measure from Training Dynamics for Semi-supervised Graph Learning

Published:13 May 2024Publication History

ABSTRACT

In the realm of semi-supervised graph learning, pseudo-labeling is a pivotal strategy to utilize both labeled and unlabeled nodes for model training. Currently, confidence score is the most frequently used pseudo-labeling measure, however, it suffers from poor calibration and issues in out-of-distribution data. In this paper, we propose memory disagreement (MoDis for short), a novel uncertainty measure for pseudo-labeling. We uncover that training dynamics offer significant insights into prediction uncertainty --- if a graph model makes consistent predictions for an unlabeled node throughout training, the corresponding predicted label is likely to be correct. Thus, the node should be suitable for pseudo-labeling. The basic idea is supported by recent studies on training dynamics. We implement MoDis as the entropy of an accumulated distribution that summarizes the disagreement of the model's predictions throughout training. We further enhance and analyze MoDis in case studies, which show nodes with low MoDis are suitable for pseudo-labeling as these nodes tend to be distant from boundaries in both graph and representation space. We design MoDis based pseudo-label selection algorithm and corresponding pseudo-labeling algorithm, which are applicable to various graph neural networks. We empirically validate MoDis on eight benchmark graph datasets. The experimental results show that pseudo labels given by MoDis have better quality in correctness and information gain, and the algorithm benefits various graph neural networks, achieving an average relative improvement of 3.11% and reaching up to 30.24% when compared to the wildly-used uncertainty measure, confidence score. Moreover, we demonstrate the efficacy of MoDis on out-of-distribution nodes.

Skip Supplemental Material Section

Supplemental Material

rfp0479.mp4

Supplemental video

mp4

126.7 MB

References

  1. Moloud Abdar, Farhad Pourpanah, Sadiq Hussain, Dana Rezazadegan, Li Liu, Mohammad Ghavamzadeh, Paul Fieguth, Xiaochun Cao, Abbas Khosravi, U Rajendra Acharya, et al. 2021. A review of uncertainty quantification in deep learning: Techniques, applications and challenges. Information fusion , Vol. 76 (2021), 243--297.Google ScholarGoogle Scholar
  2. Devansh Arpit, Stanislaw Jastrzebski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S. Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, and Simon Lacoste-Julien. 2017. A Closer Look at Memorization in Deep Networks. In Proceedings of the 34th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 70). PMLR, 233--242.Google ScholarGoogle Scholar
  3. David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver, and Colin A Raffel. 2019. Mixmatch: A holistic approach to semi-supervised learning. Advances in neural information processing systems , Vol. 32 (2019).Google ScholarGoogle Scholar
  4. Aleksandar Bojchevski and Stephan Günnemann. 2018. Deep Gaussian Embedding of Graphs: Unsupervised Inductive Learning via Ranking. In International Conference on Learning Representations. 1--13.Google ScholarGoogle Scholar
  5. Paola Cascante-Bonilla, Fuwen Tan, Yanjun Qi, and Vicente Ordonez. 2021. Curriculum labeling: Revisiting pseudo-labeling for semi-supervised learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 6912--6920.Google ScholarGoogle ScholarCross RefCross Ref
  6. Olivier Chapelle and Alexander Zien. 2005. Semi-supervised classification by low density separation. In International workshop on artificial intelligence and statistics. PMLR, 57--64.Google ScholarGoogle Scholar
  7. Ming Chen, Zhewei Wei, Zengfeng Huang, Bolin Ding, and Yaliang Li. 2020. Simple and deep graph convolutional networks. In International conference on machine learning. PMLR, 1725--1735.Google ScholarGoogle Scholar
  8. A Philip Dawid. 1982. The well-calibrated Bayesian. J. Amer. Statist. Assoc. , Vol. 77, 379 (1982), 605--610.Google ScholarGoogle ScholarCross RefCross Ref
  9. Emilio Dorigatti, Jann Goschenhofer, Benjamin Schubert, Mina Rezaei, and Bernd Bischl. 2022. Positive-Unlabeled Learning with Uncertainty-aware Pseudo-label Selection. https://openreview.net/forum?id=jJis-v9PzhjGoogle ScholarGoogle Scholar
  10. Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning. PMLR, 1050--1059.Google ScholarGoogle Scholar
  11. Johannes Gasteiger, Aleksandar Bojchevski, and Stephan Günnemann. 2019. Predict then Propagate: Graph Neural Networks meet Personalized PageRank. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  12. Jakob Gawlikowski, Cedrique Rovile Njieutcheu Tassi, Mohsin Ali, Jongseok Lee, Matthias Humt, Jianxiang Feng, Anna Kruspe, Rudolph Triebel, Peter Jung, Ribana Roscher, et al. 2023. A survey of uncertainty in deep neural networks. Artificial Intelligence Review (2023), 1--77.Google ScholarGoogle Scholar
  13. Git-repo. 2023. https://github.com/XJTU-Graph-Intelligence-Lab/MoDis-main/. Accessed: 2024-02--16.Google ScholarGoogle Scholar
  14. Yves Grandvalet and Yoshua Bengio. 2004. Semi-supervised learning by entropy minimization. Advances in neural information processing systems , Vol. 17 (2004).Google ScholarGoogle Scholar
  15. Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q Weinberger. 2017. On calibration of modern neural networks. In International conference on machine learning. PMLR, 1321--1330.Google ScholarGoogle Scholar
  16. Dongxiao He, Jitao Zhao, Rui Guo, Zhiyong Feng, Di Jin, Yuxiao Huang, Zhen Wang, and Weixiong Zhang. 2023 b. Contrastive learning meets homophily: two birds with one stone. In International Conference on Machine Learning. PMLR, 12775--12789.Google ScholarGoogle Scholar
  17. Haiyun He, Gholamali Aminian, Yuheng Bu, Miguel Rodrigues, and Vincent YF Tan. 2023 a. How Does Pseudo-Labeling Affect the Generalization Error of the Semi-Supervised Gibbs Algorithm?. In International Conference on Artificial Intelligence and Statistics. PMLR, 8494--8520.Google ScholarGoogle Scholar
  18. Matthias Hein, Maksym Andriushchenko, and Julian Bitterwolf. 2019. Why relu networks yield high-confidence predictions far away from the training data and how to mitigate the problem. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 41--50.Google ScholarGoogle ScholarCross RefCross Ref
  19. Zijian Hu, Zhengyu Yang, Xuefeng Hu, and Ram Nevatia. 2021. Simple: Similar pseudo label exploitation for semi-supervised classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15099--15108.Google ScholarGoogle ScholarCross RefCross Ref
  20. Thomas N. Kipf and Max Welling. 2017a. Semi-Supervised Classification with Graph Convolutional Networks. In International Conference on Learning Representations. https://openreview.net/forum?id=SJU4ayYglGoogle ScholarGoogle Scholar
  21. Thomas N. Kipf and Max Welling. 2017b. Semi-Supervised Classification with Graph Convolutional Networks. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  22. Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. 2017. Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in neural information processing systems , Vol. 30 (2017).Google ScholarGoogle Scholar
  23. Dong-Hyun Lee et al. 2013. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Workshop on challenges in representation learning, ICML, Vol. 3. 896.Google ScholarGoogle Scholar
  24. Qimai Li, Zhichao Han, and Xiao-Ming Wu. 2018. Deeper insights into graph convolutional networks for semi-supervised learning. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.Google ScholarGoogle ScholarCross RefCross Ref
  25. Yayong Li, Jie Yin, and Ling Chen. 2023. Informative pseudo-labeling for graph neural networks with few labels. Data Mining and Knowledge Discovery , Vol. 37, 1 (2023), 228--254.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Hongrui Liu, Binbin Hu, Xiao Wang, Chuan Shi, Zhiqiang Zhang, and Jun Zhou. 2022. Confidence may cheat: Self-training on graph neural networks under distribution shift. In Proceedings of the ACM Web Conference 2022. 1248--1258.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Galileo Namata, Ben London, Lise Getoor, Bert Huang, and U Edu. 2012. Query-driven active surveying for collective classification. In 10th international workshop on mining and learning with graphs, Vol. 8. 1.Google ScholarGoogle Scholar
  28. Anh Nguyen, Jason Yosinski, and Jeff Clune. 2015. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE conference on computer vision and pattern recognition. 427--436.Google ScholarGoogle ScholarCross RefCross Ref
  29. Duc Tam Nguyen, Chaithanya Kumar Mummadi, Thi Phuong Nhung Ngo, Thi Hoai Phuong Nguyen, Laura Beggel, and Thomas Brox. 2020. SELF: Learning to Filter Noisy Labels with Self-Ensembling. In International Conference on Learning Representations. https://openreview.net/forum?id=HkgsPhNYPSGoogle ScholarGoogle Scholar
  30. Hongbin Pei, Taile Chen, Chen A, Huiqi Deng, Jing Tao, Pinghui Wang, and Xiaohong Guan. 2024. HAGO-Net: Hierarchical Geometric Massage Passing for Molecular Representation Learning. In Proceedings of the AAAI Conference on Artificial Intelligence.Google ScholarGoogle ScholarCross RefCross Ref
  31. Hongbin Pei, Bingzhe Wei, Kevin Chang, Chunxu Zhang, and Bo Yang. 2020b. Curvature regularization to prevent distortion in graph embedding. Advances in Neural Information Processing Systems , Vol. 33 (2020), 20779--20790.Google ScholarGoogle Scholar
  32. Hongbin Pei, Bingzhe Wei, Kevin Chen-Chuan Chang, Yu Lei, and Bo Yang. 2020a. Geom-GCN: Geometric Graph Convolutional Networks. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  33. Hongbin Pei, Bo Yang, Jiming Liu, and Kevin Chen-Chuan Chang. 2020c. Active surveillance via group sparse Bayesian learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 44, 3 (2020), 1133--1148.Google ScholarGoogle ScholarCross RefCross Ref
  34. Oleg Platonov, Denis Kuznedelev, Artem Babenko, and Liudmila Prokhorenkova. 2022. Characterizing graph datasets for node classification: Beyond homophily-heterophily dichotomy. arXiv preprint arXiv:2209.06177 (2022).Google ScholarGoogle Scholar
  35. Mamshad Nayeem Rizve, Kevin Duarte, Yogesh S Rawat, and Mubarak Shah. 2021. In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label Selection Framework for Semi-Supervised Learning. In International Conference on Learning Representations. https://openreview.net/forum?id=-ODN6SbiUUGoogle ScholarGoogle Scholar
  36. Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina Eliassi-Rad. 2008. Collective classification in network data. AI magazine, Vol. 29, 3 (2008), 93--93.Google ScholarGoogle Scholar
  37. Burr Settles, Mark Craven, and Soumya Ray. 2007. Multiple-Instance Active Learning. In Advances in Neural Information Processing Systems, J. Platt, D. Koller, Y. Singer, and S. Roweis (Eds.), Vol. 20. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2007/file/a1519de5b5d44b31a01de013b9b51a80-Paper.pdfGoogle ScholarGoogle Scholar
  38. Oleksandr Shchur, Maximilian Mumme, Aleksandar Bojchevski, and Stephan Günnemann. 2018. Pitfalls of graph neural network evaluation. arXiv preprint arXiv:1811.05868 (2018).Google ScholarGoogle Scholar
  39. Shoaib Ahmed Siddiqui, Nitarshan Rajkumar, Tegan Maharaj, David Krueger, and Sara Hooker. 2023. Metadata Archaeology: Unearthing Data Subsets by Leveraging Training Dynamics. In The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=PvLnIaJbt9Google ScholarGoogle Scholar
  40. Yu Song and Donglin Wang. 2022. Learning on Graphs with Out-of-Distribution Nodes. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1635--1645.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Tiberiu Sosea and Cornelia Caragea. 2022. Leveraging Training Dynamics and Self-Training for Text Classification. In Findings of the Association for Computational Linguistics: EMNLP 2022. 4750--4762.Google ScholarGoogle ScholarCross RefCross Ref
  42. Chuxiong Sun, Hongming Gu, and Jie Hu. 2021. Scalable and adaptive graph neural networks with self-label-enhanced training. arXiv preprint arXiv:2104.09376 (2021).Google ScholarGoogle Scholar
  43. Ke Sun, Zhouchen Lin, and Zhanxing Zhu. 2020. Multi-stage self-supervised learning for graph convolutional networks on graphs with few labeled nodes. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 5892--5899.Google ScholarGoogle ScholarCross RefCross Ref
  44. Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research , Vol. 9, 11 (2008).Google ScholarGoogle Scholar
  45. Petar Velivc ković , Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  46. Hao Wang, Defu Lian, Hanghang Tong, Qi Liu, Zhenya Huang, and Enhong Chen. 2021a. Hypersorec: Exploiting hyperbolic user and item representations with multiple aspects for social-aware recommendation. ACM Transactions on Information Systems (TOIS), Vol. 40, 2 (2021), 1--28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Hao Wang, Tong Xu, Qi Liu, Defu Lian, Enhong Chen, Dongfang Du, Han Wu, and Wen Su. 2019. MCNE: An end-to-end framework for learning multiple conditional network representations of social network. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 1064--1072.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Min Wang, Hao Yang, and Qing Cheng. 2022. GCL: Graph Calibration Loss for Trustworthy Graph Neural Network. In Proceedings of the 30th ACM International Conference on Multimedia. 988--996.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Xiao Wang, Hongrui Liu, Chuan Shi, and Cheng Yang. 2021b. Be confident! towards trustworthy graph neural networks via confidence calibration. Advances in Neural Information Processing Systems , Vol. 34 (2021), 23768--23779.Google ScholarGoogle Scholar
  50. Yuxi Wang, Junran Peng, and ZhaoXiang Zhang. 2021c. Uncertainty-aware pseudo label refinery for domain adaptive semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9092--9101.Google ScholarGoogle ScholarCross RefCross Ref
  51. David H Wolpert and William G Macready. 1997. No free lunch theorems for optimization. IEEE transactions on evolutionary computation, Vol. 1, 1 (1997), 67--82.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Han Yang, Xiao Yan, Xinyan Dai, Yongqiang Chen, and James Cheng. 2021. Self-enhanced gnn: Improving graph neural networks using model outputs. In 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  53. Liang Yang, Yuanfang Guo, Junhua Gu, Di Jin, Bo Yang, and Xiaochun Cao. 2022. Probabilistic Graph Convolutional Network via Topology-Constrained Latent Space Model. IEEE Trans. Cybern. , Vol. 52, 4 (2022), 2123--2136. https://doi.org/10.1109/TCYB.2020.3005938Google ScholarGoogle ScholarCross RefCross Ref
  54. Liang Yang, Zesheng Kang, Xiaochun Cao, Di Jin, Bo Yang, and Yuanfang Guo. 2019. Topology Optimization based Graph Convolutional Network. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10--16, 2019. 4054--4061. https://doi.org/10.24963/IJCAI.2019/563Google ScholarGoogle ScholarCross RefCross Ref
  55. Liang Yang, Fan Wu, Junhua Gu, Chuan Wang, Xiaochun Cao, Di Jin, and Yuanfang Guo. 2020. Graph Attention Topic Modeling Network. In WWW '20: The Web Conference 2020, Taipei, Taiwan, April 20--24, 2020. 144--154. https://doi.org/10.1145/3366423.3380102Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Ruiheng Zhang, Zhe Cao, Shuo Yang, Lingyu Si, Haoyang Sun, Lixin Xu, and Fuchun Sun. 2024. Cognition-Driven Structural Prior for Instance-Dependent Label Transition Matrix Estimation. IEEE Transactions on Neural Networks and Learning Systems (2024).Google ScholarGoogle ScholarCross RefCross Ref
  57. Xujiang Zhao, Feng Chen, Shu Hu, and Jin-Hee Cho. 2020. Uncertainty aware semi-supervised learning on graph data. Advances in Neural Information Processing Systems , Vol. 33 (2020), 12827--12836.Google ScholarGoogle Scholar
  58. Ziang Zhou, Jieming Shi, Shengzhong Zhang, Zengfeng Huang, and Qing Li. 2023. Effective stabilized self-training on few-labeled graph data. Information Sciences , Vol. 631 (2023), 369--384. ioGoogle ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Memory Disagreement: A Pseudo-Labeling Measure from Training Dynamics for Semi-supervised Graph Learning

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            WWW '24: Proceedings of the ACM on Web Conference 2024
            May 2024
            4826 pages
            ISBN:9798400701719
            DOI:10.1145/3589334

            Copyright © 2024 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 13 May 2024

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate1,899of8,196submissions,23%
          • Article Metrics

            • Downloads (Last 12 months)43
            • Downloads (Last 6 weeks)43

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader