ABSTRACT
Graph Neural Networks (GNNs) have succeeded in various computer science applications, yet deep GNNs underperform their shallow counterparts despite deep learning's success in other domains. Over-smoothing and over-squashing are key challenges when stacking graph convolutional layers, hindering deep representation learning and information propagation from distant nodes. Our work reveals that over-smoothing and over-squashing are intrinsically related to the spectral gap of the graph Laplacian, resulting in an inevitable trade-off between these two issues, as they cannot be alleviated simultaneously. To achieve a suitable compromise, we propose adding and removing edges as a viable approach. We introduce the Stochastic Jost and Liu Curvature Rewiring (SJLR) algorithm, which is computationally efficient and preserves fundamental properties compared to previous curvature-based methods. Unlike existing approaches, SJLR performs edge addition and removal during GNN training while maintaining the graph unchanged during testing. Comprehensive comparisons demonstrate SJLR's competitive performance in addressing over-smoothing and over-squashing.
Supplemental Material
- Uri Alon and Eran Yahav. 2021. On the bottleneck of graph neural networks and its practical implications. In International Conference on Learning Representations.Google Scholar
- Adrien Benamira, Benjamin Devillers, Etienne Lesot, Ayush K. Ray, Manal Saadi, and Fragkiskos D. Malliaros. 2019. Semi-Supervised Learning and Graph Neural Networks for Fake News Detection. In International Conference on Advances in Social Networks Analysis and Mining.Google Scholar
- Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. 2014. Spectral networks and locally connected networks on graphs. In International Conference on Learning Representations.Google Scholar
- Deng Cai and Wai Lam. 2020. Graph transformer for graph-to-sequence learning. In AAAI Conference on Artificial Intelligence.Google ScholarCross Ref
- Shaofei Cai, Liang Li, Jincan Deng, Beichen Zhang, Zheng-Jun Zha, Li Su, and Qingming Huang. 2021. Rethinking graph neural architecture search from message-passing. In IEEE/CVF Conference on Computer Vision and Pattern Recognition.Google ScholarCross Ref
- Jeff Cheeger. 1970. A lower bound for the smallest eigenvalue of the Laplacian. Problems in Analysis, Vol. 625, 195--199 (1970), 110.Google Scholar
- Chaoqi Chen, Yushuang Wu, Qiyuan Dai, Hong-Yu Zhou, Mutian Xu, Sibei Yang, Xiaoguang Han, and Yizhou Yu. 2022. A Survey on Graph Neural Networks and Graph Transformers in Computer Vision: A Task-Oriented Perspective. arXiv preprint arXiv:2209.13232 (2022).Google Scholar
- Deli Chen, Yankai Lin, Wei Li, Peng Li, Jie Zhou, and Xu Sun. 2020a. Measuring and Relieving the Over-Smoothing Problem for Graph Neural Networks from the Topological View. In AAAI Conference on Artificial Intelligence.Google ScholarCross Ref
- Yu Chen, Lingfei Wu, and Mohammed Zaki. 2020b. Iterative deep graph learning for graph neural networks: Better and robust node embeddings. In Advances in Neural Information Processing Systems.Google Scholar
- Eli Chien, Jianhao Peng, Pan Li, and Olgica Milenkovic. 2021. Adaptive universal generalized PageRank graph neural network. In International Conference on Learning Representations.Google Scholar
- Fan R. K. Chung. 1997. Spectral graph theory. Number 92. American Mathematical Soc.Google Scholar
- Francesco Di Giovanni, Lorenzo Giusti, Federico Barbero, Giulia Luise, Pietro Lio, and Michael Bronstein. 2023. On Over-Squashing in Message Passing Neural Networks: The Impact of Width, Depth, and Topology. arXiv preprint arXiv:2302.02941 (2023).Google Scholar
- Alexandre Duval, Victor Schmidt, Alex Hernández-Garc'ia, Santiago Miret, Fragkiskos D. Malliaros, Yoshua Bengio, and David Rolnick. 2023. FAENet: Frame Averaging Equivariant GNN for Materials Modeling. In International Conference on Machine Learning.Google Scholar
- Vijay Prakash Dwivedi, Ladislav Rampávs ek, Mikhail Galkin, Ali Parviz, Guy Wolf, Anh Tuan Luu, and Dominique Beaini. 2022. Long range graph benchmark. In Advances in Neural Information Processing Systems.Google Scholar
- Matthias Fey and Jan Eric Lenssen. 2019. Fast Graph Representation Learning with PyTorch Geometric. In International Conference on Learning Representations - Workshops.Google Scholar
- Pablo Gainza, Freyr Sverrisson, Frederico Monti, Emanuele Rodola, D Boscaini, Michael Bronstein, and BE Correia. 2020. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nature Methods, Vol. 17, 2 (2020), 184--192.Google ScholarCross Ref
- Michael R Garey, David S Johnson, and Larry Stockmeyer. 1974. Some simplified NP-complete problems. In ACM Symposium on Theory of Computing.Google ScholarDigital Library
- Johannes Gasteiger, Aleksandar Bojchevski, and Stephan Günnemann. 2019a. Predict then propagate: Graph neural networks meet personalized PageRank. In International Conference on Learning Representations.Google Scholar
- Johannes Gasteiger, Stefan Weißenberger, and Stephan Günnemann. 2019b. Diffusion improves graph learning. In Advances in Neural Information Processing Systems.Google Scholar
- Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. 2017. Neural message passing for quantum chemistry. In International Conference on Machine Learning.Google Scholar
- Jhony H Giraldo, Sajid Javed, Naoufel Werghi, and Thierry Bouwmans. 2021. Graph CNN for moving object detection in complex environments from unseen videos. In IEEE/CVF International Conference on Computer Vision.Google ScholarCross Ref
- R. Hamilton. 1998. The Ricci flow on surfaces. Mathematics and General Relativity, Vol. 71 (1998), 237--262.Google ScholarCross Ref
- Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems.Google Scholar
- Kai Han, Yunhe Wang, Jianyuan Guo, Yehui Tang, and Enhua Wu. 2022. Vision GNN: An image is worth graph of nodes. In Advances in Neural Information Processing Systems.Google Scholar
- Wei Huang, Yayong Li, Weitao Du, Jie Yin, Richard Yi Da Xu, Ling Chen, and Miao Zhang. 2022. Towards Deepening Graph Neural Networks: A GNTK-based Optimization Perspective. In International Conference on Learning Representations.Google Scholar
- Jürgen Jost and Shiping Liu. 2014. Ollivier's Ricci curvature, local clustering and curvature-dimension inequalities on graphs. Discrete & Computational Geometry, Vol. 51, 2 (2014), 300--322.Google ScholarDigital Library
- Kedar Karhadkar, Pradeep Kr Banerjee, and Guido Montúfar. 2023. FoSR: First-order spectral rewiring for addressing oversquashing in GNNs. In International Conference on Learning Representations.Google Scholar
- Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In International Conference on Learning Representations.Google Scholar
- Thomas N Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In International Conference on Learning Representations.Google Scholar
- Devin Kreuzer, Dominique Beaini, Will Hamilton, Vincent Létourneau, and Prudencio Tossou. 2021. Rethinking graph transformers with spectral attention. In Advances in Neural Information Processing Systems.Google Scholar
- Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature, Vol. 521, 7553 (2015), 436--444.Google Scholar
- Guohao Li, Matthias Muller, Ali Thabet, and Bernard Ghanem. 2019. DeepGCNs: Can GCNs go as deep as CNNs?. In IEEE/CVF International Conference on Computer Vision.Google ScholarCross Ref
- Qimai Li, Zhichao Han, and Xiao-Ming Wu. 2018. Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning. In AAAI Conference on Artificial Intelligence.Google ScholarCross Ref
- Yong Lin, Linyuan Lu, and Shing-Tung Yau. 2011. Ricci curvature of graphs. Tohoku Mathematical Journal, Second Series, Vol. 63, 4 (2011), 605--627.Google ScholarCross Ref
- Yang Liu, Chuan Zhou, Shirui Pan, Jia Wu, Zhao Li, Hongyang Chen, and Peng Zhang. 2023. CurvDrop: A Ricci Curvature Based Approach to Prevent Graph Neural Networks from Over-Smoothing and Over-Squashing. In ACM Web Conference.Google ScholarDigital Library
- Andrew Kachites McCallum, Kamal Nigam, Jason Rennie, and Kristie Seymore. 2000. Automating the construction of internet portals with machine learning. Information Retrieval, Vol. 3, 2 (2000), 127--163.Google ScholarDigital Library
- Galileo Namata, Ben London, Lise Getoor, Bert Huang, and U Edu. 2012. Query-driven active surveying for collective classification. In International Workshop on Mining and Learning with Graphs.Google Scholar
- Yann Ollivier. 2009. Ricci curvature of Markov chains on metric spaces. Journal of Functional Analysis, Vol. 256, 3 (2009), 810--864.Google ScholarCross Ref
- Kenta Oono and Taiji Suzuki. 2020. Graph neural networks exponentially lose expressive power for node classification. In International Conference on Learning Representations.Google Scholar
- Hongbin Pei, Bingzhe Wei, Kevin Chen-Chuan Chang, Yu Lei, and Bo Yang. 2020. Geom-GCN: Geometric graph convolutional networks. In International Conference on Learning Representations.Google Scholar
- Wieke Prummel, Jhony H Giraldo, Anastasia Zakharova, and Thierry Bouwmans. 2023. Inductive Graph Neural Networks for Moving Object Segmentation. In IEEE International Conference on Image Processing.Google Scholar
- Yu Rong, Wenbing Huang, Tingyang Xu, and Junzhou Huang. 2020. DropEdge: Towards deep graph convolutional networks on node classification. In International Conference on Learning Representations.Google Scholar
- Benedek Rozemberczki, Carl Allen, and Rik Sarkar. 2021. Multi-scale attributed node embedding. Journal of Complex Networks, Vol. 9, 2 (2021), 1--22.Google ScholarCross Ref
- Aliaksei Sandryhaila and Jose M. F. Moura. 2014. Discrete signal processing on graphs: Frequency analysis. IEEE Transactions on Signal Processing, Vol. 62, 12 (2014), 3042--3054.Google ScholarDigital Library
- Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina Eliassi-Rad. 2008. Collective classification in network data. AI magazine, Vol. 29, 3 (2008), 93--93.Google Scholar
- Alistair Sinclair. 2012. Algorithms for random generation and counting: a Markov chain approach. Springer Science & Business Media.Google Scholar
- Jie Tang, Jimeng Sun, Chi Wang, and Zi Yang. 2009. Social influence analysis in large-scale networks. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Google ScholarDigital Library
- Jake Topping, Francesco Di Giovanni, Benjamin Paul Chamberlain, Xiaowen Dong, and Michael M Bronstein. 2022. Understanding over-squashing and bottlenecks on graphs via curvature. In International Conference on Learning Representations.Google Scholar
- Werner Uwents, Gabriele Monfardini, Hendrik Blockeel, Marco Gori, and Franco Scarselli. 2011. Neural networks for relational learning: An experimental comparison. Machine Learning, Vol. 82, 3 (2011), 315--349.Google ScholarDigital Library
- Petar Velivc ković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2018. Graph Attention Networks. In International Conference on Learning Representations.Google Scholar
- Felix Wu, Amauri Souza, Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Weinberger. 2019. Simplifying graph convolutional networks. In International Conference on Machine Learning.Google Scholar
- Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. 2020. A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems (2020).Google ScholarCross Ref
- Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, and Tie-Yan Liu. 2021. Do transformers really perform badly for graph representation?. In Advances in Neural Information Processing Systems.Google Scholar
- Seongjun Yun, Minbyul Jeong, Raehyun Kim, Jaewoo Kang, and Hyunwoo J Kim. 2019. Graph transformer networks. In Advances in neural information processing systems.Google Scholar
- Hanqing Zeng, Muhan Zhang, Yinglong Xia, Ajitesh Srivastava, Andrey Malevich, Rajgopal Kannan, Viktor Prasanna, Long Jin, and Ren Chen. 2021. Decoupling the Depth and Scope of Graph Neural Networks. In Advances in Neural Information Processing Systems.Google Scholar
- Lingxiao Zhao and Leman Akoglu. 2020. PairNorm: Tackling oversmoothing in GNNs. In International Conference on Learning Representations.Google Scholar
- Kaixiong Zhou, Xiao Huang, Yuening Li, Daochen Zha, Rui Chen, and Xia Hu. 2020. Towards deeper graph neural networks with differentiable group normalization. In Advances in Neural Information Processing Systems.Google Scholar
- Hao Zhu and Piotr Koniusz. 2021. Simple spectral graph convolution. In International Conference on Learning Representations.Google Scholar
- Marinka Zitnik and Jure Leskovec. 2017. Predicting multicellular function through multi-layer tissue networks. Bioinformatics, Vol. 33, 14 (2017), i190--i198.Google ScholarCross Ref
Index Terms
- On the Trade-off between Over-smoothing and Over-squashing in Deep Graph Neural Networks
Recommendations
CurvDrop: A Ricci Curvature Based Approach to Prevent Graph Neural Networks from Over-Smoothing and Over-Squashing
WWW '23: Proceedings of the ACM Web Conference 2023Graph neural networks (GNNs) are powerful models to handle graph data and can achieve state-of-the-art in many critical tasks including node classification and link prediction. However, existing graph neural networks still face both challenges of over-...
Alleviating over-smoothing via graph sparsification based on vertex feature similarity
AbstractIn recent years, graph neural networks (GNNs) have developed rapidly. However, GNNs are difficult to deepen because of over-smoothing. This limits their applications. Starting from the relationship between graph sparsification and over-smoothing, ...
Understanding Dropout for Graph Neural Networks
WWW '22: Companion Proceedings of the Web Conference 2022Graph neural network (GNN) has demonstrated superior performance on graph learning tasks. GNN captures the data dependencies via message passing amid neural networks. Hence the prediction of a node label can utilize information from its neighbors in a ...
Comments