ABSTRACT
Graph machine learning algorithms are being commonly applied to a broad range of prediction tasks in systems biology. These algorithms present many design choices depending on the specific application and available data, making it difficult to choose from different options. An important design criterion in this regard is the definition of "topological similarity" between two nodes in a network, which is used to design convolution matrices for graph convolution or loss functions to evaluate node embeddings. Many measures of topological similarity exist in network science literature (e.g., random walk based proximity, shared neighborhood) and recent comparative studies show that the choice of topological similarity can have a significant effect on the performance and reliability of graph machine learning models.
We propose GraphCan, a framework for computing canonical representations for biological networks using a similarity-based Graph Convolutional Network (GCN). GraphCan integrates multiple node similarity measures to compute canonical node embeddings for a given network. The resulting embeddings can be utilized directly for downstream machine learning tasks. We comprehensively evaluate GraphCan in the context of various link prediction tasks in systems biology. Our results show that GraphCan consistently delivers improved prediction accuracy over algorithms that directly use the adjacency matrix of the input network, and the integration of multiple similarity measurements improves the robustness of the framework. The implementation of GraphCan can be found in https://github.com/Meng-zhen-Li/Similarity-based-GCN.git.
- Lada A Adamic and Eytan Adar. 2003. Friends and neighbors on the web. Social networks 25, 3 (2003), 211--230.Google Scholar
- Adrián Bazaga, Dan Leggate, and Hendrik Weisser. 2020. Genome-wide investigation of gene-cancer associations for the prediction of novel therapeutic targets in oncology. Scientific reports 10, 1 (2020), 1--10.Google Scholar
- Olivier Bodenreider. 2004. The unified medical language system (UMLS): integrating biomedical terminology. Nucleic acids research 32, suppl_1 (2004), D267--D270.Google Scholar
- Shaosheng Cao, Wei Lu, and Qiongkai Xu. 2015. Grarep: Learning graph representations with global structural information. In Proceedings of the 24th ACM international on conference on information and knowledge management. 891--900.Google ScholarDigital Library
- Mustafa Coşkun and Mehmet Koyutürk. 2021. Node similarity-based graph convolution for link prediction in biological networks. Bioinformatics 37, 23 (2021), 4501--4508.Google ScholarCross Ref
- Peng Cui, Xiao Wang, Jian Pei, and Wenwu Zhu. 2018. A survey on network embedding. IEEE Transactions on Knowledge and Data Engineering 31, 5 (2018), 833--852.Google ScholarCross Ref
- Allan Peter Davis, Cynthia J Grondin, Robin J Johnson, Daniela Sciaky, Roy McMorran, Jolene Wiegers, Thomas C Wiegers, and Carolyn J Mattingly. 2019. The comparative toxicogenomics database: update 2019. Nucleic acids research 47, D1 (2019), D948--D954.Google Scholar
- Sinan Erten, Gurkan Bebek, Rob M Ewing, and Mehmet Koyutürk. 2011. DA DA: degree-aware algorithms for network-based disease gene prioritization. BioData mining 4, 1 (2011), 1--20.Google Scholar
- Sinan Erten, Gurkan Bebek, and Mehmet Koyutürk. 2011. Vavien: an algorithm for prioritizing candidate disease genes based on topological similarity of proteins in interaction networks. Journal of computational biology 18, 11 (2011), 1561--1574.Google ScholarCross Ref
- Yuchong Gong, Yanqing Niu, Wen Zhang, and Xiaohong Li. 2019. A network embedding-based multiple information integration method for the MiRNA-disease association prediction. BMC bioinformatics 20, 1 (2019), 1--13.Google Scholar
- Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 855--864.Google ScholarDigital Library
- Pietro Hiram Guzzi and Swarup Roy. 2020. Biological Network Analysis: Trends, Approaches, Graph Theory, and Algorithms. Elsevier.Google Scholar
- Takahiko Ito, Masashi Shimbo, Taku Kudo, and Yuji Matsumoto. 2005. Application of kernels to link analysis. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. 586--592.Google ScholarDigital Library
- Jian Kang, Yan Zhu, Yinglong Xia, Jiebo Luo, and Hanghang Tong. 2022. Rawlsgcn: Towards rawlsian difference principle on graph convolutional network. In Proceedings of the ACM Web Conference 2022. 1214--1225.Google ScholarDigital Library
- Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).Google Scholar
- Thomas N Kipf and Max Welling. 2016. Variational Graph Auto-Encoders. NIPS Workshop on Bayesian Deep Learning (2016).Google Scholar
- Mengzhen Li, Mustafa Coşkun, and Mehmet Koyutürk. 2022. Consensus embedding for multiple networks: Computation and applications. Network Science 10, 2 (2022), 190--206.Google ScholarCross Ref
- Xiangyu Li, Weizheng Chen, Yang Chen, Xuegong Zhang, Jin Gu, and Michael Q Zhang. 2017. Network embedding-based representation learning for single cell RNA-seq data. Nucleic acids research (2017).Google Scholar
- Pedro G Lind, Marta C Gonzalez, and Hans J Herrmann. 2005. Cycles and clustering in bipartite networks. Physical review E 72, 5 (2005), 056127.Google Scholar
- Linyuan Lü, Ci-Hang Jin, and Tao Zhou. 2009. Similarity index based on local paths for link prediction of complex networks. Physical Review E 80, 4 (2009), 046122.Google ScholarCross Ref
- Walter Nelson, Marinka Zitnik, Bo Wang, Jure Leskovec, Anna Goldenberg, and Roded Sharan. 2019. To embed or not: network embedding as a paradigm in computational biology. Frontiers in genetics 10 (2019), 381.Google Scholar
- Ryan A Rossi, Di Jin, Sungchul Kim, Nesreen K Ahmed, Danai Koutra, and John Boaz Lee. 2019. From community to role-based graph embeddings. arXiv e-prints (2019), arXiv-1908.Google Scholar
- Chris Stark, Bobby-Joe Breitkreutz, Teresa Reguly, Lorrie Boucher, Ashton Breitkreutz, and Mike Tyers. 2006. BioGRID: a general repository for interaction datasets. Nucleic acids research 34, suppl_1 (2006), D535--D539.Google Scholar
- Chang Su, Jie Tong, Yongjun Zhu, Peng Cui, and Fei Wang. 2020. Network embedding in biomedical data science. Briefings in bioinformatics 21, 1 (2020), 182--197.Google Scholar
- Damian Szklarczyk, Annika L Gable, David Lyon, Alexander Junge, Stefan Wyder, Jaime Huerta-Cepas, Milan Simonovic, Nadezhda T Doncheva, John H Morris, Peer Bork, et al. 2019. STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic acids research 47, D1 (2019), D607--D613.Google Scholar
- Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan. 2006. Fast random walk with restart and its applications. In Sixth international conference on data mining (ICDM'06). IEEE, 613--622.Google ScholarDigital Library
- David S Wishart, Yannick D Feunang, An C Guo, Elvis J Lo, Ana Marcu, Jason R Grant, Tanvir Sajed, Daniel Johnson, Carin Li, Zinat Sayeeda, et al. 2018. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic acids research 46, D1 (2018), D1074--D1082.Google Scholar
- Xiang Yue, Zhen Wang, Jingong Huang, Srinivasan Parthasarathy, Soheil Moosavinasab, Yungui Huang, Simon M Lin, Wen Zhang, Ping Zhang, and Huan Sun. 2020. Graph embedding on biomedical networks: methods, applications and evaluations. Bioinformatics 36, 4 (2020), 1241--1251.Google ScholarCross Ref
Index Terms
- Canonical Representation of Biological Networks Using Graph Convolution
Recommendations
Exploiting node-feature bipartite graph in graph convolutional networks
AbstractIn recent years, Graph Convolutional Networks (GCNs), which extend convolutional neural networks to graph structure, have achieved great success on many graph learning tasks by fusing structure and feature information, such as node ...
GTCN: Dynamic Network Embedding Based on Graph Temporal Convolution Neural Network
Intelligent Computing Theories and ApplicationAbstractNetwork embedding aims to learn the low-dimensional node representations from high-dimensional network structures of complex systems. Embedding in dynamic networks is a very difficult but important problem due to the dynamics of network structures ...
Graph Convolutional Networks for Road Networks
SIGSPATIAL '19: Proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information SystemsThe application of machine learning techniques in the setting of road networks holds the potential to facilitate many important transportation applications. Graph Convolutional Networks (GCNs) are neural networks that are capable of leveraging the ...
Comments