clustAnalytics: An R Package for Assessing Stability and Significance of Communities in Networks

This paper introduces the R package clustAnalytics, which comprises a set of criteria for assessing the significance and stability of communities in networks found by any clustering algorithm. clustAnalytics works with graphs of class igraph from the R-package igraph, extended to handle weighted and/or directed graphs. clustAnalytics provides a set of community scoring functions, and methods to systematically compare their values to those of a suitable null model, which are of use when testing for cluster significance. It also provides a non parametric bootstrap method combined with similarity metrics derived from information theory and combinatorics, useful when testing for cluster stability, as well as a method to synthetically generate a weighted network with a ground truth community structure based on the preferential attachment model construction, producing networks with communities and scale-free degree distribution.

Martí Renedo-Mirambell (Department of Computer Sciences,) , Argimiro Arratia (Soft Computing Research Group (SOCO))
2023-11-01

0.1 Supplementary materials

Supplementary materials are available in addition to this article. It can be downloaded at RJ-2023-057.zip

A. Arratia and M. Renedo-Mirambell. Clustering assessment in weighted networks. PeerJ Computer Science, 7(e600): 1–27, 2021.
G. Brock, V. Pihur, S. Datta and S. Datta. clValid: Validation of clustering results. 2021. URL https://CRAN.R-project.org/package=clValid. R package version 0.7.
G. Csardi and T. Nepusz. The igraph software package for complex network research. InterJournal, Complex Systems, 1695(5): 1–9, 2006.
S. Fortunato. Community detection in graphs. Physics Reports, 486(3): 75–174, 2010. DOI https://doi.org/10.1016/j.physrep.2009.11.002.
B. Hajek and S. Sankagiri. Community recovery in a preferential attachment graph. IEEE Transactions on Information Theory, 65(11): 6853–6874, 2019. DOI 10.1109/TIT.2019.2927624.
C. Hennig. Cluster-wise assessment of cluster stability. Computational Statistics & Data Analysis, 52(1): 258–271, 2007. DOI https://doi.org/10.1016/j.csda.2006.11.025.
H. Huang, Y. Liu and J. S. Marron. Sigclust: Statistical significance of clustering. 2014. URL https://CRAN.R-project.org/package=sigclust. R package version 1.1.0.
L. Hubert and P. Arabie. Comparing partitions. Journal of Classification, 2(1): 193–218, 1985. DOI https://doi.org/10.1007/BF01908075.
M. P. McAssey and F. Bijma. A clustering coefficient for complete weighted networks. Network Science, 3(2): 183–195, 2015. DOI http://dx.doi.org/10.1017/nws.2014.26.
M. Meilă. Comparing clusterings - an information based distance. Journal of Multivariate Analysis, 98(5): 873–895, 2007. DOI http://dx.doi.org/10.1016/j.jmva.2006.11.013.
P. Pons and M. Latapy. Computing communities in large networks using random walks. In International symposium on computer and information sciences, pages. 284–293 2005. Springer. DOI https://doi.org/10.1007/11569596_31.
U. N. Raghavan, R. Albert and S. Kumara. Near linear time algorithm to detect community structures in large-scale networks. Physical Review E, 76(3): 2007.
A. R. Rao, R. Jana and S. Bandyopadhyay. A markov chain monte carlo method for generating random (0, 1)-matrices with given marginals. Sankhyā: The Indian Journal of Statistics, Series A (1961-2002), 58(2): 225–242, 1996.
M. Renedo-Mirambell and A. Arratia. Identifying bias in network clustering quality metrics. PeerJ Computer Science, 9:e1523: 2023. DOI https://doi.org/10.7717/peerj-cs.1523.
J. Yang and J. Leskovec. Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems, 42(1): 181–213, 2015. DOI http://dx.doi.org/10.1007/s10115-013-0693-z.
W. W. Zachary. An information flow model for conflict and fission in small groups. Journal of Anthropological Research, 33(4): 452–473, 1977.

References

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Renedo-Mirambell & Arratia, "clustAnalytics: An R Package for Assessing Stability and Significance of Communities in Networks", The R Journal, 2023

BibTeX citation

@article{RJ-2023-057,
  author = {Renedo-Mirambell, Martí and Arratia, Argimiro},
  title = {clustAnalytics: An R Package for Assessing Stability and Significance of Communities in Networks},
  journal = {The R Journal},
  year = {2023},
  note = {https://doi.org/10.32614/RJ-2023-057},
  doi = {10.32614/RJ-2023-057},
  volume = {15},
  issue = {2},
  issn = {2073-4859},
  pages = {134-144}
}