skip to main content
10.1145/1135777.1135807acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
Article

Probabilistic models for discovering e-communities

Authors Info & Claims
Published:23 May 2006Publication History

ABSTRACT

The increasing amount of communication between individuals in e-formats (e.g. email, Instant messaging and the Web) has motivated computational research in social network analysis (SNA). Previous work in SNA has emphasized the social network (SN) topology measured by communication frequencies while ignoring the semantic information in SNs. In this paper, we propose two generative Bayesian models for semantic community discovery in SNs, combining probabilistic modeling with community detection in SNs. To simulate the generative models, an EnF-Gibbs sampling algorithm is proposed to address the efficiency and performance problems of traditional methods. Experimental studies on Enron email corpus show that our approach successfully detects the communities of individuals and in addition provides semantic topic descriptions of these communities.

References

  1. D. Blei, et al., Latent Dirichlet allocation, Journal of Machine Learning Research, 3, 993--1022, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Aaron Clauset, et al., Finding community structure in very large networks, Phys. Rev. E 70, 066111, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  3. Aron Culotta, et al., Extracting social networks and contact information from email and the Web, In First Conference on Email and Anti-Spam, Mountain View, CA, USA. July 2005.Google ScholarGoogle Scholar
  4. P. Domingos, et al., Mining the network value of customers, In 7th ACM SIGKDD International Conference on Knowledge Discovery and Data mining, 57--66, ACM Press, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. Flake, S. Lawrence and Lee Giles, Efficient Identification of Web Communities, In 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , 150--160, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Girvan and M. Newman, Community structure in social and biological networks. In Proceedings of National Academic Science, USA 99, 7821--7826, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  7. T. Griffiths, Finding scientific topics, In National Academy of Sciences, 5228--5235, 2004.Google ScholarGoogle Scholar
  8. B. W. Kernighan, An efficient heuristic procedure for partitioning graphs, Bell System Technical Journal, 49, 291--307, 1970.Google ScholarGoogle ScholarCross RefCross Ref
  9. Ana Maguitman, et al., Algorithmic detection of semantic similarity, In Proceedings of the 14th International Conference on World Wide Web (WWW 2005)). Chiba, Japan, May. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Naohiro Matsumura, et al., Mining Social Networks in Message Boards, In Proceedings of the 14th International Conference on World Wide Web (WWW 2005), Chiba, Japan, May. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. McCallum, Multi-label text classification with a mixture model trained by EM, In AAAI Workshop on Text Learning, 1999.Google ScholarGoogle Scholar
  12. A. McCallum, et al., The Author-Recipient-Topic Model for Topic and Role Discovery in Social Networks: Experiments with Enron and Academic Email, Technical Report, Computer Science, University of Massachusetts Amherst, 2004.Google ScholarGoogle Scholar
  13. Mark Newman, Fast algorithm for detecting community structure in networks, Phys. Rev., E, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  14. Mark Newman, Detecting community structure in networks, Eur. Phys. 38, 321--330, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  15. Mike Perkowitz, et al., Mining models of human activities from the Web, In Proceedings of the 13th International Conference on World Wide Web (WWW 2004), New York, NY, USA, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. W. M. Rand, Objective criteria for the evaluation of clustering methods, Journal of American Statistical Association, 66:846--850, 1971.Google ScholarGoogle ScholarCross RefCross Ref
  17. M. Richardson, et al., Mining knowledge-sharing sites for viral marketing, In 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 61--70, ACM Press, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Christian P. Robert and George Casella, Monte Carlo Statistical Methods, Springer Publisher. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Scott, Social Network Analysis: A Handbook, Sage, London, 2nd edition, 2000.Google ScholarGoogle Scholar
  20. Jitesh Shetty, et al., The Enron Email Dataset Database Schema and Brief Statistical Report, Information Sciences Institute, 2004.Google ScholarGoogle Scholar
  21. Mark Steyvers, et al., Probabilistic author-topic models for information discovery, In 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 306-315, Seattle, WA, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Joshua R. Tyler, et al., Email as Spectroscopy: Automated Discovery of Community Structure within Organizations, Communities and Technologies, 81--96, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Stanley Wasserman and Katherine Faust, Social Network Analysis: Methods and Applications, Cambridge University Press, 1994.Google ScholarGoogle Scholar
  24. S. White, et al., Algorithms for estimating relative importance in networks, In 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 266--275, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Andrew Y. Wu, et al., Mining scale-free networks using geodesic clustering, In Proceedings of the 10th SIGKDD International Conference on Knowledge Discovery and Data Mining, 719--724, Seattle, Washington, USA, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Ding Zhou, et al., A New Mallows distance based Metric for Comparing Clusterings, In Proceedings of the 22rd International Conference on Machine Learning (ICML 2005), 8pp, Bonn, Germany, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ding Zhou, et al., Towards Discovering Organizational Structure from Email Corpus, In Proceedings of the 4th IEEE International Conference on Machine Learning and Applications, 8 pp, Los Angeles, CA, USA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Probabilistic models for discovering e-communities

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            WWW '06: Proceedings of the 15th international conference on World Wide Web
            May 2006
            1102 pages
            ISBN:1595933239
            DOI:10.1145/1135777

            Copyright © 2006 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 23 May 2006

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • Article

            Acceptance Rates

            Overall Acceptance Rate1,899of8,196submissions,23%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader