skip to main content
research-article
Public Access

Partitioning Networks with Node Attributes by Compressing Information Flow

Published:19 November 2016Publication History
Skip Abstract Section

Abstract

Real-world networks are often organized as modules or communities of similar nodes that serve as functional units. These networks are also rich in content, with nodes having distinguished features or attributes. In order to discover a network’s modular structure, it is necessary to take into account not only its links but also node attributes. We describe an information-theoretic method that identifies modules by compressing descriptions of information flow on a network. Our formulation introduces node content into the description of information flow, which we then minimize to discover groups of nodes with similar attributes that also tend to trap the flow of information. The method is conceptually simple and does not require ad-hoc parameters to specify the number of modules or to control the relative contribution of links and node attributes to network structure. We apply the proposed method to partition real-world networks with known community structure. We demonstrate that adding node attributes helps recover the underlying community structure in content-rich networks more effectively than using links alone. In addition, we show that our method is faster and more accurate than alternative state-of-the-art algorithms.

References

  1. L. Akoglu, H. Tong, B. Meeder, and C. Faloutsos. 2012. PICS: Parameter-free identification of cohesive subgroups in large attributed graphs. In SDM. SIAM, 439--450.Google ScholarGoogle Scholar
  2. D. M. Blei, A. Y. Ng, and M. I. Jordan. 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3 (2003), 993--1022. Google ScholarGoogle ScholarCross RefCross Ref
  3. S. Brin and L. Page. 1998. The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30 (1998), 107--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. S. Choi, P. J. Wolfe, and E. M. Airoldi. 2012. Stochastic blockmodels with a growing number of classes. Biometrika 99, 2 (2012), 273--284.Google ScholarGoogle ScholarCross RefCross Ref
  5. T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Yan-Tao. Zheng. 2009. NUS-WIDE: A real-world web image database from national university of Singapore. In CIVR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. F. R. K. Chung. 1996. Spectral Graph Theory. CBMS Regional Conference Series in Mathematics, Vol. 92. American Mathematical Society. http://www.amazon.com/exec/obidos/redirect?tag=citeulike07-208path=ASIN/0821803158.Google ScholarGoogle Scholar
  7. J. D. Cruz, C. Bothorel, and F. Poulet. 2011. Entropy based community detection in augmented social networks. In CASoN. IEEE, 163--168.Google ScholarGoogle Scholar
  8. M. Everingham, S. M. Ali Eslami, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. 2015. The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision 111, 1 (2015), 98--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Fortunato. 2010. Community detection in graphs. Phys. Rep. 486 (Jan. 2010), 75--174.Google ScholarGoogle Scholar
  10. S. Günnemann, I. Färber, S. Raubach, and T. Seidl. 2013. Spectral subspace clustering for graphs with feature vectors. In 2013 IEEE 13th International Conference on Data Mining. 231--240.Google ScholarGoogle Scholar
  11. K. Henderson, T. Eliassi-Rad, S. Papadimitriou, and C. Faloutsos. 2010. HCDF: A hybrid community discovery framework. In SDM. 754--7--65.Google ScholarGoogle Scholar
  12. T. Deselaers H. Mller, P. Clough and B. Caputo (Eds.). 2010. Experimental Evaluation in Visual Information Retrieval. The Information Retrieval Series, Vol. 32, Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. A. Huffman. 1952. A method for the construction of minimum-redundancy codes. Proc. Inst. Radio Eng. 40, 9 (September 1952), 1098--1101.Google ScholarGoogle Scholar
  14. M. J. Huiskes and M. S. Lew. 2008. The MIR flickr retrieval evaluation. In MIR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Jerrum and A. Sinclair. 1988. Conductance and the rapid mixing property for Markov chains: The approximation of permanent resolved. In Proceedings of the 20th Annual ACM Symposium on Theory of Computing (STOC’88). ACM, New York, NY, USA, 235--244. http://dx.doi.org/10.1145/62212.62234 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. B. Long, Z. (M.) Zhang, X. Wú, and P. S. Yu. 2006. Spectral clustering for multi-type relational data. In Proceedings of the 23rd International Conference on Machine Learning. ACM, 585--592. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. McAuley and J. Leskovec. 2012. Learning to discover social circles in ego networks. NIPS (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. McAuley and J. Leskovec. 2012. In ECCV (4) (Lecture Notes in Computer Science). Springer, 828--841. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. McPherson, L. Smith-Lovin, and J. M. Cook. 2001. Birds of a feather: Homophily in social networks. Annu. Rev. Sociol. 27 (2001), 415--444.Google ScholarGoogle ScholarCross RefCross Ref
  20. F. Moser, R. Colak, A. Rafiey, and M. Ester. 2009. Mining cohesive patterns from graphs with feature vectors. In Proceedings of the SIAM International Conference on Data Mining. 593--604.Google ScholarGoogle Scholar
  21. M. E. J. Newman. 2006. Finding community structer in networks using the eigenvectors of matrices. Phys. Rev. E 74, 3 (2006).Google ScholarGoogle ScholarCross RefCross Ref
  22. G.-J. Qi, C. C. Aggarwal, and T. S. Huang. 2012. Community detection with edge content in social media networks. In ICDE. 534--545. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. E. Ravasz, A. L. Somera, D. A. Mongru, Z. N. Oltvai, and A. L. Barabási. 2002. Hierarchical organization of modularity in metabolic networks. Science 297, 5586 (2002), 1551--1555.Google ScholarGoogle Scholar
  24. A. W. Rives and T. Galitski. 2003. Modular organization of cellular networks. Proc Natl Acad Sci U S A 100, 3 (2003), 1128--1133.Google ScholarGoogle ScholarCross RefCross Ref
  25. M. Rosvall and Carl T. Bergstrom. 2008. Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences 105, 4 (29 Jan. 2008), 1118--1123. DOI:http://dx.doi.org/10.1073/pnas.0706851105Google ScholarGoogle Scholar
  26. Y. Ruan, D. Fuhry, and S. Parthasarathy. 2013. Efficient community detection in large networks using content and links. In WWW. 1089--1098. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. P. Sen, G. M. Namata, M. Bilgic, L. Getoor, Brian Gallagher, and Tina Eliassi-Rad. 2008. Collective classification in network data. AI Magazine 29, 3 (2008), 93--106.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. C. E. Shannon. 1948. A mathematical theory of communication. Bell Syst. Tech. J. 27 (1948), 379--423.Google ScholarGoogle ScholarCross RefCross Ref
  29. A. Silva, W. Meira, Jr., and M. J. Zaki. 2012. Mining attribute-structure correlated patterns in large attributed graphs. Proc. VLDB Endow. 5, 5 (2012), 466--477. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. L. M. Smith, K. Lerman, C. Garcia-Cardona, A. G. Percus, and R. Ghosh. 2013a. Spectral clustering with epidemic diffusion. Phys. Rev. E 88 (2013), 042813. DOI:http://dx.doi.org/10.1103/PhysRevE.88.042813Google ScholarGoogle ScholarCross RefCross Ref
  31. L. M. Smith, L. Zhu, K. Lerman, and Z. Kozareva. 2013b. The role of social media in the discussion of controversial topics. In ASE/IEEE International Conference on Social Computing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. D. A. Spielman and S.-H. Teng. 2007. Spectral partitioning works: Planar graphs and finite element meshes. Linear Algebr. Appl. 421, 2--3 (March 2007), 284--305. DOI:http://dx.doi.org/10.1016/j.laa.2006.07.020Google ScholarGoogle ScholarCross RefCross Ref
  33. J. Tang, J. Sun, C. Wang, and Z. Yang. 2009. Social influence analysis in large-scale networks. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2009). Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Ugander and L. Backstrom. 2013. Balanced label propagation for partitioning massive graphs. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining. 507--516. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. U. von Luxburg. 2007. A tutorial on spectral clustering. Stat. Comput. 17, 4 (1 Dec. 2007), 395--416. DOI:http://dx.doi.org/10.1007/s11222-007-9033-z Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Z. Xu, Y. Ke, Y. Wang, H. Cheng, and J. Cheng. 2012. A model-based approach to attributed graph clustering. In SIGMOD Conference. 505--516. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. J. Yang, J. McAuley, and J. Leskovec. 2013. Community detection in networks with node attributes. In International Conference On Data Mining (ICDM). IEEE.Google ScholarGoogle Scholar
  38. T. Yang, R. Jin, Y. Chi, and S. Zhu. 2009. Combining link and content for community detection: A discriminative approach. In KDD. ACM, New York, NY, USA, 927--936. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Y. Zhou, H. Cheng, and J. Xu Yu. 2009. Graph clustering based on structural/attribute similarities. PVLDB 2, 1 (2009), 718--729. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. L. Zhu, W. Keong Ng, and J. Cheng. 2011. Structure and attribute index for approximate graph matching in large graphs. Inf. Syst. 36, 6 (2011), 958--972. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Y. Zhu, X. Yan, L. Getoor, and C. Moore. 2013. Scalable text and link analysis with mixed-topic link models. In Proc. of KDD. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Partitioning Networks with Node Attributes by Compressing Information Flow

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Knowledge Discovery from Data
          ACM Transactions on Knowledge Discovery from Data  Volume 11, Issue 2
          May 2017
          419 pages
          ISSN:1556-4681
          EISSN:1556-472X
          DOI:10.1145/3017677
          Issue’s Table of Contents

          Copyright © 2016 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 19 November 2016
          • Accepted: 1 July 2016
          • Revised: 1 June 2016
          • Received: 1 June 2015
          Published in tkdd Volume 11, Issue 2

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader