Abstract
Real-world networks are often organized as modules or communities of similar nodes that serve as functional units. These networks are also rich in content, with nodes having distinguished features or attributes. In order to discover a network’s modular structure, it is necessary to take into account not only its links but also node attributes. We describe an information-theoretic method that identifies modules by compressing descriptions of information flow on a network. Our formulation introduces node content into the description of information flow, which we then minimize to discover groups of nodes with similar attributes that also tend to trap the flow of information. The method is conceptually simple and does not require ad-hoc parameters to specify the number of modules or to control the relative contribution of links and node attributes to network structure. We apply the proposed method to partition real-world networks with known community structure. We demonstrate that adding node attributes helps recover the underlying community structure in content-rich networks more effectively than using links alone. In addition, we show that our method is faster and more accurate than alternative state-of-the-art algorithms.
- L. Akoglu, H. Tong, B. Meeder, and C. Faloutsos. 2012. PICS: Parameter-free identification of cohesive subgroups in large attributed graphs. In SDM. SIAM, 439--450.Google Scholar
- D. M. Blei, A. Y. Ng, and M. I. Jordan. 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3 (2003), 993--1022. Google ScholarCross Ref
- S. Brin and L. Page. 1998. The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30 (1998), 107--117. Google ScholarDigital Library
- D. S. Choi, P. J. Wolfe, and E. M. Airoldi. 2012. Stochastic blockmodels with a growing number of classes. Biometrika 99, 2 (2012), 273--284.Google ScholarCross Ref
- T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Yan-Tao. Zheng. 2009. NUS-WIDE: A real-world web image database from national university of Singapore. In CIVR. Google ScholarDigital Library
- F. R. K. Chung. 1996. Spectral Graph Theory. CBMS Regional Conference Series in Mathematics, Vol. 92. American Mathematical Society. http://www.amazon.com/exec/obidos/redirect?tag=citeulike07-208path=ASIN/0821803158.Google Scholar
- J. D. Cruz, C. Bothorel, and F. Poulet. 2011. Entropy based community detection in augmented social networks. In CASoN. IEEE, 163--168.Google Scholar
- M. Everingham, S. M. Ali Eslami, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. 2015. The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision 111, 1 (2015), 98--136. Google ScholarDigital Library
- S. Fortunato. 2010. Community detection in graphs. Phys. Rep. 486 (Jan. 2010), 75--174.Google Scholar
- S. Günnemann, I. Färber, S. Raubach, and T. Seidl. 2013. Spectral subspace clustering for graphs with feature vectors. In 2013 IEEE 13th International Conference on Data Mining. 231--240.Google Scholar
- K. Henderson, T. Eliassi-Rad, S. Papadimitriou, and C. Faloutsos. 2010. HCDF: A hybrid community discovery framework. In SDM. 754--7--65.Google Scholar
- T. Deselaers H. Mller, P. Clough and B. Caputo (Eds.). 2010. Experimental Evaluation in Visual Information Retrieval. The Information Retrieval Series, Vol. 32, Springer. Google ScholarDigital Library
- D. A. Huffman. 1952. A method for the construction of minimum-redundancy codes. Proc. Inst. Radio Eng. 40, 9 (September 1952), 1098--1101.Google Scholar
- M. J. Huiskes and M. S. Lew. 2008. The MIR flickr retrieval evaluation. In MIR. Google ScholarDigital Library
- M. Jerrum and A. Sinclair. 1988. Conductance and the rapid mixing property for Markov chains: The approximation of permanent resolved. In Proceedings of the 20th Annual ACM Symposium on Theory of Computing (STOC’88). ACM, New York, NY, USA, 235--244. http://dx.doi.org/10.1145/62212.62234 Google ScholarDigital Library
- B. Long, Z. (M.) Zhang, X. Wú, and P. S. Yu. 2006. Spectral clustering for multi-type relational data. In Proceedings of the 23rd International Conference on Machine Learning. ACM, 585--592. Google ScholarDigital Library
- J. McAuley and J. Leskovec. 2012. Learning to discover social circles in ego networks. NIPS (2012). Google ScholarDigital Library
- J. McAuley and J. Leskovec. 2012. In ECCV (4) (Lecture Notes in Computer Science). Springer, 828--841. Google ScholarDigital Library
- M. McPherson, L. Smith-Lovin, and J. M. Cook. 2001. Birds of a feather: Homophily in social networks. Annu. Rev. Sociol. 27 (2001), 415--444.Google ScholarCross Ref
- F. Moser, R. Colak, A. Rafiey, and M. Ester. 2009. Mining cohesive patterns from graphs with feature vectors. In Proceedings of the SIAM International Conference on Data Mining. 593--604.Google Scholar
- M. E. J. Newman. 2006. Finding community structer in networks using the eigenvectors of matrices. Phys. Rev. E 74, 3 (2006).Google ScholarCross Ref
- G.-J. Qi, C. C. Aggarwal, and T. S. Huang. 2012. Community detection with edge content in social media networks. In ICDE. 534--545. Google ScholarDigital Library
- E. Ravasz, A. L. Somera, D. A. Mongru, Z. N. Oltvai, and A. L. Barabási. 2002. Hierarchical organization of modularity in metabolic networks. Science 297, 5586 (2002), 1551--1555.Google Scholar
- A. W. Rives and T. Galitski. 2003. Modular organization of cellular networks. Proc Natl Acad Sci U S A 100, 3 (2003), 1128--1133.Google ScholarCross Ref
- M. Rosvall and Carl T. Bergstrom. 2008. Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences 105, 4 (29 Jan. 2008), 1118--1123. DOI:http://dx.doi.org/10.1073/pnas.0706851105Google Scholar
- Y. Ruan, D. Fuhry, and S. Parthasarathy. 2013. Efficient community detection in large networks using content and links. In WWW. 1089--1098. Google ScholarDigital Library
- P. Sen, G. M. Namata, M. Bilgic, L. Getoor, Brian Gallagher, and Tina Eliassi-Rad. 2008. Collective classification in network data. AI Magazine 29, 3 (2008), 93--106.Google ScholarDigital Library
- C. E. Shannon. 1948. A mathematical theory of communication. Bell Syst. Tech. J. 27 (1948), 379--423.Google ScholarCross Ref
- A. Silva, W. Meira, Jr., and M. J. Zaki. 2012. Mining attribute-structure correlated patterns in large attributed graphs. Proc. VLDB Endow. 5, 5 (2012), 466--477. Google ScholarDigital Library
- L. M. Smith, K. Lerman, C. Garcia-Cardona, A. G. Percus, and R. Ghosh. 2013a. Spectral clustering with epidemic diffusion. Phys. Rev. E 88 (2013), 042813. DOI:http://dx.doi.org/10.1103/PhysRevE.88.042813Google ScholarCross Ref
- L. M. Smith, L. Zhu, K. Lerman, and Z. Kozareva. 2013b. The role of social media in the discussion of controversial topics. In ASE/IEEE International Conference on Social Computing. Google ScholarDigital Library
- D. A. Spielman and S.-H. Teng. 2007. Spectral partitioning works: Planar graphs and finite element meshes. Linear Algebr. Appl. 421, 2--3 (March 2007), 284--305. DOI:http://dx.doi.org/10.1016/j.laa.2006.07.020Google ScholarCross Ref
- J. Tang, J. Sun, C. Wang, and Z. Yang. 2009. Social influence analysis in large-scale networks. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2009). Google ScholarDigital Library
- J. Ugander and L. Backstrom. 2013. Balanced label propagation for partitioning massive graphs. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining. 507--516. Google ScholarDigital Library
- U. von Luxburg. 2007. A tutorial on spectral clustering. Stat. Comput. 17, 4 (1 Dec. 2007), 395--416. DOI:http://dx.doi.org/10.1007/s11222-007-9033-z Google ScholarDigital Library
- Z. Xu, Y. Ke, Y. Wang, H. Cheng, and J. Cheng. 2012. A model-based approach to attributed graph clustering. In SIGMOD Conference. 505--516. Google ScholarDigital Library
- J. Yang, J. McAuley, and J. Leskovec. 2013. Community detection in networks with node attributes. In International Conference On Data Mining (ICDM). IEEE.Google Scholar
- T. Yang, R. Jin, Y. Chi, and S. Zhu. 2009. Combining link and content for community detection: A discriminative approach. In KDD. ACM, New York, NY, USA, 927--936. Google ScholarDigital Library
- Y. Zhou, H. Cheng, and J. Xu Yu. 2009. Graph clustering based on structural/attribute similarities. PVLDB 2, 1 (2009), 718--729. Google ScholarDigital Library
- L. Zhu, W. Keong Ng, and J. Cheng. 2011. Structure and attribute index for approximate graph matching in large graphs. Inf. Syst. 36, 6 (2011), 958--972. Google ScholarDigital Library
- Y. Zhu, X. Yan, L. Getoor, and C. Moore. 2013. Scalable text and link analysis with mixed-topic link models. In Proc. of KDD. Google ScholarDigital Library
Index Terms
- Partitioning Networks with Node Attributes by Compressing Information Flow
Recommendations
Dynamic community detection including node attributes
AbstractCommunity detection is an important task in social network analysis. It is generally based on the links of a static network, where groups of connected nodes can be found. Real-world problems, however, are often characterized by behavior that ...
Highlights- Community detection is an important task in Social Network Analysis.
- Social networks are dynamic and their structure changes over time.
- Nodes’ attributes, as well as links, are important to identify such changes.
- Using nodes’ ...
A Simple and Effective Community Detection Method Combining Network Topology with Node Attributes
Knowledge Science, Engineering and ManagementAbstractCommunity detection is a fundamental problem in the study of complex networks. So far, extensive approaches, which use network topology alone or use both network topology and attribute information, have been designed to detect the community ...
Community Detection with Topological Structure and Attributes in Information Networks
Survey Paper, Special Issue: Intelligent Music Systems and Applications and Regular PapersInformation networks contain objects connected by multiple links and described by rich attributes. Detecting community for these networks is a challenging research problem, because there is a scarcity of effective approaches that balance the features of ...
Comments