ABSTRACT
Community detection is an important task in network analysis. A community (also referred to as a cluster) is a set of cohesive vertices that have more connections inside the set than outside. In many social and information networks, these communities naturally overlap. For instance, in a social network, each vertex in a graph corresponds to an individual who usually participates in multiple communities. One of the most successful techniques for finding overlapping communities is based on local optimization and expansion of a community metric around a seed set of vertices. In this paper, we propose an efficient overlapping community detection algorithm using a seed set expansion approach. In particular, we develop new seeding strategies for a personalized PageRank scheme that optimizes the conductance community score. The key idea of our algorithm is to find good seeds, and then expand these seed sets using the personalized PageRank clustering procedure. Experimental results show that this seed set expansion approach outperforms other state-of-the-art overlapping community detection methods. We also show that our new seeding strategies are better than previous strategies, and are thus effective in finding good overlapping clusters in a graph.
- Stanford Network Analysis Project. http://snap.stanford.edu/.Google Scholar
- B. Abrahao, S. Soundarajan, J. Hopcroft, and R. Kleinberg. On the separability of structural classes of communities. In KDD, pages 624--632, 2012. Google ScholarDigital Library
- Y.-Y. Ahn, J. P. Bagrow, and S. Lehmann. Link communities reveal multiscale complexity in networks. Nature, 466:761--764, 2010.Google ScholarCross Ref
- R. Andersen, F. Chung, and K. Lang. Local graph partitioning using PageRank vectors. In FOCS, 2006. Google ScholarDigital Library
- R. Andersen and K. J. Lang. Communities from seed sets. In WWW, pages 223--232, 2006. Google ScholarDigital Library
- F. Bonchi, P. Esfandiar, D. F. Gleich, C. Greif, and L. V. Lakshmanan. Fast matrix computations for pairwise and columnwise commute times and Katz scores. Internet Mathematics, 8(1--2):73--112, 2012.Google Scholar
- R. Burt. Structural Holes: The Social Structure of Competition. Harvard University Press, 1995.Google Scholar
- M. Coscia, G. Rossetti, F. Giannotti, and D. Pedreschi. Demon: a local-first discovery method for overlapping communities. In KDD, 2012. Google ScholarDigital Library
- I. S. Dhillon, Y. Guan, and B. Kulis. Weighted graph cuts without eigenvectors: A multilevel approach. PAMI, 29(11):1944--1957, 2007. Google ScholarDigital Library
- U. Gargi, W. Lu, V. Mirrokni, and S. Yoon. Large-scale community detection on YouTube for topic discovery and exploration. In ICWSM, 2011.Google Scholar
- D. F. Gleich and C. Seshadhri. Vertex neighborhoods, low conductance cuts, and good seeds for local community methods. In KDD, pages 597--605, 2012. Google ScholarDigital Library
- G. Karypis and V. Kumar. Multilevel k-way partitioning scheme for irregular graphs. JPDC, 48:96--129, 1998. Google ScholarDigital Library
- R. Khandekar, G. Kortsarz, and V. Mirrokni. Advantage of overlapping clusters for minimizing conductance. In LATIN, pages 494--505, 2012. Google ScholarDigital Library
- R. I. Kondor and J. D. Lafferty. Diffusion kernels on graphs and other discrete input spaces. In ICML, 2002. Google ScholarDigital Library
- D. Lai, X. Wu, H. Lu, and C. Nardini. Learning overlapping communities in complex networks via non-negative matrix factorization. Int. J. Mod Phys C, 22(10):1173--1190, 2011.Google ScholarCross Ref
- J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 6(1):29--123, 2009.Google ScholarCross Ref
- M. W. Mahoney, L. Orecchia, and N. K. Vishnoi. A local spectral method for graphs: With applications to improving graph partitions and exploring data graphs locally. JMLR, 13:2339--2365, 2012. Google ScholarDigital Library
- N. Mishra, R. Schreiber, I. Stanton, and R. E. Tarjan. Clustering social networks. In WAW, 2007. Google ScholarDigital Library
- A. Mislove, H. S. Koppula, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Growth of the Flickr social network. In The First Workshop on Online Social Networks, 2008. Google ScholarDigital Library
- G. Palla, I. Derényi, I. Farkas, and T. Vicsek. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435:814--818, 2005.Google ScholarCross Ref
- J.-Y. Pan, H.-J. Yang, C. Faloutsos, and P. Duygulu. Automatic multimedia cross-modal correlation discovery. In KDD, pages 653--658, 2004. Google ScholarDigital Library
- B. S. Rees and K. B. Gallagher. Overlapping community detection by collective friendship group inference. In ASONAM, pages 375--379, 2010. Google ScholarDigital Library
- H. Shen, X. Cheng, K. Cai, and M.-B. Hu. Detect overlapping and hierarchical community structure in networks. Phys. A, 388(8):1706--1712, 2009.Google ScholarCross Ref
- H. H. Song, B. Savas, T. W. Cho, V. Dave, I. Dhillon, Y. Zhang, and L. Qiu. Clustered embedding of massive social networks. In SIGMETRICS, 2012. Google ScholarDigital Library
- J. J. Whang, X. Sui, and I. S. Dhillon. Scalable and memory-efficient clustering of large-scale social networks. In ICDM, pages 705--714, 2012. Google ScholarDigital Library
- J. Xie, S. Kelley, and B. K. Szymanski. Overlapping community detection in networks: the state of the art and comparative study. ACM Computing Surveys, 2013. Google ScholarDigital Library
- J. Yang and J. Leskovec. Overlapping community detection at scale: a nonnegative matrix factorization approach. In WSDM, pages 587--596, 2013. Google ScholarDigital Library
- S. Zhang, R.-S. Wang, and X.-S. Zhang. Identification of overlapping community structure in complex networks using fuzzy c-means clustering. Phys. A, 374(1):483 -- 490, 2007.Google ScholarCross Ref
Index Terms
- Overlapping community detection using seed set expansion
Recommendations
Overlapping Community Detection Using Neighborhood-Inflated Seed Expansion
Community detection is an important task in network analysis. A community (also referred to as a cluster) is a set of cohesive vertices that have more connections inside the set than outside. In many social and information networks, these communities ...
Community membership identification from small seed sets
KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data miningIn many applications we have a social network of people and would like to identify the members of an interesting but unlabeled group or community. We start with a small number of exemplar group members -- they may be followers of a political ideology or ...
Multiple Local Community Detection via High-Quality Seed Identification
Web and Big DataAbstractLocal community detection aims to find the communities that a given seed node belongs to. Most existing works on this problem are based on a very strict assumption that the seed node only belongs to a single community, but in real-world networks, ...
Comments