Abstract
Graph Convolutional Network (GCN) is a powerful model to deal with data arranged as a graph, a structured non-euclidian domain. It is known that GCN reaches high accuracy even when operating with just 2 layers. Another well-known result shows that Extreme Learning Machine (ELM) is an efficient analytic learning technique to train 2 layers Multi-Layer Perceptron (MLP). In this work, we extend ELM theory to operate in the context of GCN, giving rise to ELM-GCN, a novel learning mechanism to train GCN that turns out to be faster than baseline techniques while maintaining prediction capability. We also show a theoretical upper bound in the number of hidden units required to guarantee the GCN performance. To the best of our knowledge, our approach is the first to provide such theoretical guarantees while proposing a non-iterative learning algorithm to train graph convolutional networks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Baydin, A.G., Pearlmutter, B.A., Radul, A.A., Siskind, J.M.: Automatic differentiation in machine learning: a survey. J. Mach. Learn. Res. 18 (2018)
Baykara, M., Abdulrahman, A.: Seizure detection based on adaptive feature extraction by applying extreme learning machines. Traitement du Signal 38(2), 331–340 (2021)
Chen, J., Ma, T., Xiao, C.: Fastgcn: fast learning with graph convolutional networks via importance sampling. arXiv preprint arXiv:1801.10247 (2018)
Chiang, W.L., Liu, X., Si, S., Li, Y., Bengio, S., Hsieh, C.J.: Cluster-gcn: an efficient algorithm for training deep and large graph convolutional networks. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 257–266 (2019)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Deng, W., Zheng, Q., Chen, L.: Regularized extreme learning machine. In: 2009 IEEE Symposium on Computational Intelligence and Data Mining, pp. 389–395. IEEE (2009)
Hamilton, W.L., Ying, R., Leskovec, J.: Inductive representation learning on large graphs. arXiv preprint arXiv:1706.02216 (2017)
He, B., Xu, D., Nian, R., van Heeswijk, M., Yu, Q., Miche, Y., Lendasse, A.: Fast face recognition via sparse coding and extreme learning machine. Cogn. Comput. 6(2), 264–277 (2014)
Huang, G.B., Wang, D.H., Lan, Y.: Extreme learning machines: a survey. Int. J. Mach. Learn. Cybern. 2(2), 107–122 (2011)
Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: theory and applications. Neurocomputing 70(1–3), 489–501 (2006)
Inaba, F.K., Teatini Salles, E.O., Perron, S., Caporossi, G.: DGR-ELM - Distributed Generalized Regularized ELM for classification. Neurocomputing 275, 1522–1530 (2018)
Jin, G., Wang, Q., Zhu, C., Feng, Y., Huang, J., Zhou, J.: Addressing crime situation forecasting task with temporal graph convolutional neural network approach. In: 2020 12th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA), pp. 474–478. IEEE (2020)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
Kipf, T.N., Welling, M.: Variational graph auto-encoders. arXiv preprint arXiv:1611.07308 (2016)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Li, Q., Han, Z., Wu, X.M.: Deeper insights into graph convolutional networks for semi-supervised learning. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Lv, Q., Niu, X., Dou, Y., Xu, J., Lei, Y.: Classification of hyperspectral remote sensing image using hierarchical local-receptive-field-based extreme learning machine. IEEE Geosci. Remote Sens. Lett. 13(3), 434–438 (2016)
Martínez-Martínez, J.M., Escandell-Montero, P., Soria-Olivas, E., Martín-Guerrero, J.D., Magdalena-Benedito, R., Gómez-Sanchis, J.: Regularized extreme learning machine for regression problems. Neurocomputing 74(17), 3716–3721 (2011). https://doi.org/10.1016/j.neucom.2011.06.013. https://linkinghub.elsevier.com/retrieve/pii/S092523121100378X
Seo, Y., Defferrard, M., Vandergheynst, P., Bresson, X.: Structured sequence modeling with graph convolutional recurrent networks. In: Cheng, L., Leung, A.C.S., Ozawa, S. (eds.) ICONIP 2018. LNCS, vol. 11301, pp. 362–373. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-04167-0_33
Shchur, O., Mumme, M., Bojchevski, A., Günnemann, S.: Pitfalls of graph neural network evaluation. arXiv preprint arXiv:1811.05868 (2018)
da Silva, B.L.S., Inaba, F.K., Salles, E.O.T., Ciarelli, P.M.: Outlier robust extreme machine learning for multi-target regression. Expert Syst. Appl. 140, 112877 (2020)
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. arXiv preprint arXiv:1710.10903 (2017)
Wu, F., Souza, A., Zhang, T., Fifty, C., Yu, T., Weinberger, K.: Simplifying graph convolutional networks. In: International Conference on Machine Learning, pp. 6861–6871. PMLR (2019)
Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Philip, S.Y.: A comprehensive survey on graph neural networks. IEEE Trans. Neural Networks Learn. Syst. (2020)
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-Second AAAI conference on Artificial Intelligence (2018)
Yang, Z., Cohen, W., Salakhudinov, R.: Revisiting semi-supervised learning with graph embeddings. In: International Conference on Machine Learning, pp. 40–48. PMLR (2016)
Ying, R., He, R., Chen, K., Eksombatchai, P., Hamilton, W.L., Leskovec, J.: Graph convolutional neural networks for web-scale recommender systems. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 974–983 (2018)
You, Y., Chen, T., Wang, Z., Shen, Y.: L2-gcn: layer-wise and learned efficient training of graph convolutional networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2127–2135 (2020)
Zeng, H., Prasanna, V.: Graphact: accelerating gcn training on cpu-fpga heterogeneous platforms. In: Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 255–265 (2020)
Zhang, K., Luo, M.: Outlier-robust extreme learning machine for regression problems. Neurocomputing 151, 1519–1527 (2015)
Zhang, M., Chen, Y.: Link prediction based on graph neural networks. arXiv preprint arXiv:1802.09691 (2018)
Zhang, Z., Cai, Y., Gong, W., Liu, X., Cai, Z.: Graph convolutional extreme learning machine. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2020)
Zhao, L., et al.: T-gcn: a temporal graph convolutional network for traffic prediction. IEEE Trans. Intell. Transp. Syst. (2019)
Acknowledgments
This work was supported by grant #2018/24516-0, São Paulo Research Foundation (FAPESP). The opinions, hypotheses and conclusions or recommendations expressed in this material are those of responsibility of the author(s) and do not necessarily reflect FAPESP’s view.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix 1. Proofs
In this appendix we present the proofs of Theorems 1 and 2, which support the theory of Extreme Learning Machine to Graph Convolutional Networks. We also show that RELM-GCN matches ELM-GCN when \(\gamma \rightarrow \infty \).
Theorem 1
Proof
We interpret the convolved graph signal matrix \(\hat{X}\) rows as the input samples of classic ELM Theorem (Theorem 2.1 from [10]). Since the hypotheses that rows of \(\hat{X}\) are distinct, \(\sigma \) is an infinitely differentiable activation function, and \(\varTheta ^{(1)}\) is randomly sampled from some continuous probability distribution, we conclude the hypothesis of the classic ELM Theorem is satisfied, thus \(\sigma (\hat{X} \varTheta ^{(1)}) = \sigma (\hat{A} X \varTheta ^{(1)}) =: Y_h\) is invertible with probability one.
Since the edge weights of G are non-negative, all the elements in matrix \(\tilde{A}\) (Eq. 2) are also non-negative. Moreover, diagonal elements of \(\tilde{A}\) are positive. Thus, diagonal elements of \(\tilde{D}\) (Eq. 2) are also positive and hence \(\tilde{D}\) and \(\tilde{D}^{-1/2}\) are invertible matrices. Furthermore, from the hypothesis that \(\tilde{A}\) is invertible, Eq. 2 shows that \(\hat{A}\) must also be an invertible matrix. Therefore, with probability one, \((\hat{A}\ Y_h)\) is invertible and by defining \(\varTheta ^{(2)} := (\hat{A} Y_h)^{-1}T\) we have \(\Vert \hat{A} Y_h \varTheta ^{(2)} - T \Vert = 0\), concluding the proof of the Theorem.
Theorem 2
Proof
Following exactly the same proof of classic ELM Theorem (Theorem 2.2 from [10]), the validity of the Theorem comes from the fact that, otherwise, one could choose \(H = N\) which makes \(\Vert \hat{A} Y_h \varTheta ^{(2)} - T \Vert = 0 < \epsilon \) according to Theorem 1.
Now we show that ELM-GCN is a special case of RELM-GCN when \(\gamma \rightarrow \infty \).
Proof
When \(\gamma \rightarrow \infty \) we have \(\frac{1}{\gamma }I \rightarrow 0\). Thus the analytical assignment to \(\varTheta ^{(2)}\) by RELM-GCN (last instruction of Algorithm 3) becomes:
which is the assignment to \(\varTheta ^{(2)}\) given by ELM-GCN (last instruction of Algorithm 2).
Appendix 2. Additional Experiment Details
In the following, we further validate the results obtained in the experiments involving real data. Specifically, we analyse the different runs that produced Fig. 3 using Wilcoxon Signed-Rank Test [5]. This hypothesis test is a non-parametric statistical test that compares two samples that are paired. Precisely, the test compares accuracy and training time produced by either ELM-GCN or RELM-GCN against the ones resulting from the other two algorithms. First, we consider the null hypothesis that one of our approaches and competing algorithms generate results according to the same distribution. If the null hypothesis is rejected, we proceed to the next step which considers another null hypothesis that ELM-GCN or its regularized version performs worst (i.e. lower accuracy or higher training time) than the other learning techniques. In all tests we use significance of 99.9%.
Regarding ELM-GCN, the Wilcoxon test rejected the hypothesis that this technique produces output at least as accurate as BP or fastGCN. However in terms of training time, the null hypothesis that ELM-GCN comes from same distribution as its competitors is rejected. Moreover, the Wilcoxon test also rejects the hypothesis that ELM-GCN has higher training time than BP or fastGCN.
Comparing RELM-GCN with BP, the Wilcoxon test could not reject the null hypothesis that both techniques produce the same accuracy. However, when RELM-GCN is compared against fastGCN, we get an interesting outcome. The null hypothesis can not be rejected when both learning algorithms are compared in the inductive paradigm, but the hypothesis is rejected in the transductive scenario. Moreover, the hypothesis that RELM-GCN is less accurate than fastGCN in the same paradigm is rejected. Indeed, a careful analysis of Fig. 3a shows that RELM-GCN consistently outperforms fastGCN on the first 3 datasets while performing comparably on Reddit.
Furthermore, the Wilcoxon test rejected the null hypothesis that RELM-GCN is as fast as competing techniques, regardless of the algorithm and paradigm chosen to compare. Moreover, the second step of the test showed that we should also reject the hypothesis that RELM-GCN is slower than the other algorithms in any learning paradigm. Conclusions provided by the Wilcoxon test are consistent with training times shown in Fig. 3, since RELM-GCN outperforms the competing algorithms in most datasets, being only comparable with fastGCN in Pubmed.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Gonçalves, T., Nonato, L.G. (2022). Extreme Learning Machine to Graph Convolutional Networks. In: Xavier-Junior, J.C., Rios, R.A. (eds) Intelligent Systems. BRACIS 2022. Lecture Notes in Computer Science(), vol 13654 . Springer, Cham. https://doi.org/10.1007/978-3-031-21689-3_42
Download citation
DOI: https://doi.org/10.1007/978-3-031-21689-3_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21688-6
Online ISBN: 978-3-031-21689-3
eBook Packages: Computer ScienceComputer Science (R0)