Comparison and evaluation of network clustering algorithms applied to genetic interaction networks

Frontiers in Bioscience-Elite (FBE) is published by IMR Press from Volume 13 Issue 2 (2021). Previous articles were published by another publisher on a subscription basis, and they are hosted by IMR Press on imrpress.com as a courtesy and upon agreement with Frontiers in Bioscience.

Article

Lin Hou^1,2,3, Lin Wang², Arthur Berg⁴, Minping Qian^1,2, Yunping Zhu³, Fangting Li⁵, Minghua Deng^1,2,6,*

Show Less

¹ LMAM, School of Mathematical Sciences, Peking University, Beijing 100871, China

² Center for Theoretical Biology, Peking University, Beijing 100871, China

³ State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Radiation Medicine, Beijing 102206, China

⁴ Center for Statistical Genetics, Pennsylvania State University, Hershey, Pennsylvania, USA

⁵ School of Physics, Peking University, Beijing 100871, China

⁶ Center for Statistical Science, Peking University, Beijing 100871, China

*Author to whom correspondence should be addressed.

Front. Biosci. (Elite Ed) 2012, 4(6), 2150–2161; https://doi.org/10.2741/e532

Published: 1 January 2012

Download PDF

Cite

Abstract

The goal of network clustering algorithms detect dense clusters in a network, and provide a first step towards the understanding of large scale biological networks. With numerous recent advances in biotechnologies, large-scale genetic interactions are widely available, but there is a limited understanding of which clustering algorithms may be most effective. In order to address this problem, we conducted a systematic study to compare and evaluate six clustering algorithms in analyzing genetic interaction networks, and investigated influencing factors in choosing algorithms. The algorithms considered in this comparison include hierarchical clustering, topological overlap matrix, bi-clustering, Markov clustering, Bayesian discriminant analysis based community detection, and variational Bayes approach to modularity. Both experimentally identified and synthetically constructed networks were used in this comparison. The accuracy of the algorithms is measured by the Jaccard index in comparing predicted gene modules with benchmark gene sets. The results suggest that the choice differs according to the network topology and evaluation criteria. Hierarchical clustering showed to be best at predicting protein complexes, Bayesian discriminant analysis based community detection proved best under epistatic miniarray profile (EMAP) datasets, the variational Bayes approach to modularity was noticeably better than the other algorithms in the genome-scale networks.

Keywords

Genetic interaction

Network

Clustering algorithm

Jaccard index

Comparison

Epistatic miniarray profiles

Previous article in this issue

Next article in this issue

Front. Biosci. (Elite Ed) Print ISSN 1945-0494 Electronic ISSN 1945-0508