ABSTRACT
The increasing amount of communication between individuals in e-formats (e.g. email, Instant messaging and the Web) has motivated computational research in social network analysis (SNA). Previous work in SNA has emphasized the social network (SN) topology measured by communication frequencies while ignoring the semantic information in SNs. In this paper, we propose two generative Bayesian models for semantic community discovery in SNs, combining probabilistic modeling with community detection in SNs. To simulate the generative models, an EnF-Gibbs sampling algorithm is proposed to address the efficiency and performance problems of traditional methods. Experimental studies on Enron email corpus show that our approach successfully detects the communities of individuals and in addition provides semantic topic descriptions of these communities.
- D. Blei, et al., Latent Dirichlet allocation, Journal of Machine Learning Research, 3, 993--1022, 2003. Google ScholarDigital Library
- Aaron Clauset, et al., Finding community structure in very large networks, Phys. Rev. E 70, 066111, 2004.Google ScholarCross Ref
- Aron Culotta, et al., Extracting social networks and contact information from email and the Web, In First Conference on Email and Anti-Spam, Mountain View, CA, USA. July 2005.Google Scholar
- P. Domingos, et al., Mining the network value of customers, In 7th ACM SIGKDD International Conference on Knowledge Discovery and Data mining, 57--66, ACM Press, 2001. Google ScholarDigital Library
- G. Flake, S. Lawrence and Lee Giles, Efficient Identification of Web Communities, In 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , 150--160, 2000. Google ScholarDigital Library
- M. Girvan and M. Newman, Community structure in social and biological networks. In Proceedings of National Academic Science, USA 99, 7821--7826, 2002.Google ScholarCross Ref
- T. Griffiths, Finding scientific topics, In National Academy of Sciences, 5228--5235, 2004.Google Scholar
- B. W. Kernighan, An efficient heuristic procedure for partitioning graphs, Bell System Technical Journal, 49, 291--307, 1970.Google ScholarCross Ref
- Ana Maguitman, et al., Algorithmic detection of semantic similarity, In Proceedings of the 14th International Conference on World Wide Web (WWW 2005)). Chiba, Japan, May. 2005. Google ScholarDigital Library
- Naohiro Matsumura, et al., Mining Social Networks in Message Boards, In Proceedings of the 14th International Conference on World Wide Web (WWW 2005), Chiba, Japan, May. 2005. Google ScholarDigital Library
- A. McCallum, Multi-label text classification with a mixture model trained by EM, In AAAI Workshop on Text Learning, 1999.Google Scholar
- A. McCallum, et al., The Author-Recipient-Topic Model for Topic and Role Discovery in Social Networks: Experiments with Enron and Academic Email, Technical Report, Computer Science, University of Massachusetts Amherst, 2004.Google Scholar
- Mark Newman, Fast algorithm for detecting community structure in networks, Phys. Rev., E, 2004.Google ScholarCross Ref
- Mark Newman, Detecting community structure in networks, Eur. Phys. 38, 321--330, 2004.Google ScholarCross Ref
- Mike Perkowitz, et al., Mining models of human activities from the Web, In Proceedings of the 13th International Conference on World Wide Web (WWW 2004), New York, NY, USA, 2004. Google ScholarDigital Library
- W. M. Rand, Objective criteria for the evaluation of clustering methods, Journal of American Statistical Association, 66:846--850, 1971.Google ScholarCross Ref
- M. Richardson, et al., Mining knowledge-sharing sites for viral marketing, In 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 61--70, ACM Press, 2002. Google ScholarDigital Library
- Christian P. Robert and George Casella, Monte Carlo Statistical Methods, Springer Publisher. Google ScholarDigital Library
- J. Scott, Social Network Analysis: A Handbook, Sage, London, 2nd edition, 2000.Google Scholar
- Jitesh Shetty, et al., The Enron Email Dataset Database Schema and Brief Statistical Report, Information Sciences Institute, 2004.Google Scholar
- Mark Steyvers, et al., Probabilistic author-topic models for information discovery, In 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 306-315, Seattle, WA, 2004. Google ScholarDigital Library
- Joshua R. Tyler, et al., Email as Spectroscopy: Automated Discovery of Community Structure within Organizations, Communities and Technologies, 81--96, 2003. Google ScholarDigital Library
- Stanley Wasserman and Katherine Faust, Social Network Analysis: Methods and Applications, Cambridge University Press, 1994.Google Scholar
- S. White, et al., Algorithms for estimating relative importance in networks, In 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 266--275, 2003. Google ScholarDigital Library
- Andrew Y. Wu, et al., Mining scale-free networks using geodesic clustering, In Proceedings of the 10th SIGKDD International Conference on Knowledge Discovery and Data Mining, 719--724, Seattle, Washington, USA, 2004. Google ScholarDigital Library
- Ding Zhou, et al., A New Mallows distance based Metric for Comparing Clusterings, In Proceedings of the 22rd International Conference on Machine Learning (ICML 2005), 8pp, Bonn, Germany, 2005. Google ScholarDigital Library
- Ding Zhou, et al., Towards Discovering Organizational Structure from Email Corpus, In Proceedings of the 4th IEEE International Conference on Machine Learning and Applications, 8 pp, Los Angeles, CA, USA, 2005. Google ScholarDigital Library
Index Terms
- Probabilistic models for discovering e-communities
Recommendations
Detecting communities and their evolutions in dynamic social networks--a Bayesian approach
Although a large body of work is devoted to finding communities in static social networks, only a few studies examined the dynamics of communities in evolving social networks. In this paper, we propose a dynamic stochastic block model for finding ...
Mean-field variational approximate Bayesian inference for latent variable models
The ill-posed nature of missing variable models offers a challenging testing ground for new computational techniques. This is the case for the mean-field variational Bayesian inference. The behavior of this approach in the setting of the Bayesian probit ...
Online but accurate inference for latent variable models with local Gibbs sampling
We study parameter inference in large-scale latent variable models. We first propose a unified treatment of online inference for latent variable models from a non-canonical exponential family, and draw explicit links between several previously proposed ...
Comments