Article

Probabilistic models for discovering e-communities

Authors:
Ding Zhou

Pennsylvania State University, University Park, PA

Pennsylvania State University, University Park, PA
View Profile

,
Eren Manavoglu

Pennsylvania State University, University Park, PA

Pennsylvania State University, University Park, PA
View Profile

,
Jia Li

Pennsylvania State University, University Park, PA

Pennsylvania State University, University Park, PA
View Profile

,
C. Lee Giles

Pennsylvania State University, University Park, PA

Pennsylvania State University, University Park, PA
View Profile

,
Hongyuan Zha

Pennsylvania State University, University Park, PA

Pennsylvania State University, University Park, PA
View Profile

WWW '06: Proceedings of the 15th international conference on World Wide WebMay 2006Pages 173–182https://doi.org/10.1145/1135777.1135807

Published:23 May 2006Publication History

WWW '06: Proceedings of the 15th international conference on World Wide Web

Pages 173–182

ABSTRACT

The increasing amount of communication between individuals in e-formats (e.g. email, Instant messaging and the Web) has motivated computational research in social network analysis (SNA). Previous work in SNA has emphasized the social network (SN) topology measured by communication frequencies while ignoring the semantic information in SNs. In this paper, we propose two generative Bayesian models for semantic community discovery in SNs, combining probabilistic modeling with community detection in SNs. To simulate the generative models, an EnF-Gibbs sampling algorithm is proposed to address the efficiency and performance problems of traditional methods. Experimental studies on Enron email corpus show that our approach successfully detects the communities of individuals and in addition provides semantic topic descriptions of these communities.

References

D. Blei, et al., Latent Dirichlet allocation, Journal of Machine Learning Research, 3, 993--1022, 2003. Google ScholarDigital Library
Aaron Clauset, et al., Finding community structure in very large networks, Phys. Rev. E 70, 066111, 2004.Google ScholarCross Ref
Aron Culotta, et al., Extracting social networks and contact information from email and the Web, In First Conference on Email and Anti-Spam, Mountain View, CA, USA. July 2005.Google Scholar
P. Domingos, et al., Mining the network value of customers, In 7th ACM SIGKDD International Conference on Knowledge Discovery and Data mining, 57--66, ACM Press, 2001. Google ScholarDigital Library
G. Flake, S. Lawrence and Lee Giles, Efficient Identification of Web Communities, In 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , 150--160, 2000. Google ScholarDigital Library
M. Girvan and M. Newman, Community structure in social and biological networks. In Proceedings of National Academic Science, USA 99, 7821--7826, 2002.Google ScholarCross Ref
T. Griffiths, Finding scientific topics, In National Academy of Sciences, 5228--5235, 2004.Google Scholar
B. W. Kernighan, An efficient heuristic procedure for partitioning graphs, Bell System Technical Journal, 49, 291--307, 1970.Google ScholarCross Ref
Ana Maguitman, et al., Algorithmic detection of semantic similarity, In Proceedings of the 14th International Conference on World Wide Web (WWW 2005)). Chiba, Japan, May. 2005. Google ScholarDigital Library
Naohiro Matsumura, et al., Mining Social Networks in Message Boards, In Proceedings of the 14th International Conference on World Wide Web (WWW 2005), Chiba, Japan, May. 2005. Google ScholarDigital Library
A. McCallum, Multi-label text classification with a mixture model trained by EM, In AAAI Workshop on Text Learning, 1999.Google Scholar
A. McCallum, et al., The Author-Recipient-Topic Model for Topic and Role Discovery in Social Networks: Experiments with Enron and Academic Email, Technical Report, Computer Science, University of Massachusetts Amherst, 2004.Google Scholar
Mark Newman, Fast algorithm for detecting community structure in networks, Phys. Rev., E, 2004.Google ScholarCross Ref
Mark Newman, Detecting community structure in networks, Eur. Phys. 38, 321--330, 2004.Google ScholarCross Ref
Mike Perkowitz, et al., Mining models of human activities from the Web, In Proceedings of the 13th International Conference on World Wide Web (WWW 2004), New York, NY, USA, 2004. Google ScholarDigital Library
W. M. Rand, Objective criteria for the evaluation of clustering methods, Journal of American Statistical Association, 66:846--850, 1971.Google ScholarCross Ref
M. Richardson, et al., Mining knowledge-sharing sites for viral marketing, In 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 61--70, ACM Press, 2002. Google ScholarDigital Library
Christian P. Robert and George Casella, Monte Carlo Statistical Methods, Springer Publisher. Google ScholarDigital Library
J. Scott, Social Network Analysis: A Handbook, Sage, London, 2nd edition, 2000.Google Scholar
Jitesh Shetty, et al., The Enron Email Dataset Database Schema and Brief Statistical Report, Information Sciences Institute, 2004.Google Scholar
Mark Steyvers, et al., Probabilistic author-topic models for information discovery, In 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 306-315, Seattle, WA, 2004. Google ScholarDigital Library
Joshua R. Tyler, et al., Email as Spectroscopy: Automated Discovery of Community Structure within Organizations, Communities and Technologies, 81--96, 2003. Google ScholarDigital Library
Stanley Wasserman and Katherine Faust, Social Network Analysis: Methods and Applications, Cambridge University Press, 1994.Google Scholar
S. White, et al., Algorithms for estimating relative importance in networks, In 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 266--275, 2003. Google ScholarDigital Library
Andrew Y. Wu, et al., Mining scale-free networks using geodesic clustering, In Proceedings of the 10th SIGKDD International Conference on Knowledge Discovery and Data Mining, 719--724, Seattle, Washington, USA, 2004. Google ScholarDigital Library
Ding Zhou, et al., A New Mallows distance based Metric for Comparing Clusterings, In Proceedings of the 22rd International Conference on Machine Learning (ICML 2005), 8pp, Bonn, Germany, 2005. Google ScholarDigital Library
Ding Zhou, et al., Towards Discovering Organizational Structure from Email Corpus, In Proceedings of the 4th IEEE International Conference on Machine Learning and Applications, 8 pp, Los Angeles, CA, USA, 2005. Google ScholarDigital Library

Index Terms

Probabilistic models for discovering e-communities

Recommendations

Detecting communities and their evolutions in dynamic social networks--a Bayesian approach

Although a large body of work is devoted to finding communities in static social networks, only a few studies examined the dynamics of communities in evolving social networks. In this paper, we propose a dynamic stochastic block model for finding ...
Read More
Mean-field variational approximate Bayesian inference for latent variable models

The ill-posed nature of missing variable models offers a challenging testing ground for new computational techniques. This is the case for the mean-field variational Bayesian inference. The behavior of this approach in the setting of the Bayesian probit ...
Read More
Online but accurate inference for latent variable models with local Gibbs sampling

We study parameter inference in large-scale latent variable models. We first propose a unified treatment of online inference for latent variable models from a non-canonical exponential family, and draw explicit links between several previously proposed ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '06: Proceedings of the 15th international conference on World Wide Web
May 2006
1102 pages
ISBN:1595933239
DOI:10.1145/1135777
General Chairs:
Leslie Carr
University of Southampton
,
David De Roure
University of Southampton
,
Arun Iyengar
IBM Research
,
Program Chairs:
Carole Goble
University of Manchester, UK
,
Mike Dahlin
University of Texas at Austin
Copyright © 2006 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 May 2006
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Gibbs sampling
clustering
data mining
email
social network
statistical modeling
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 160
  Total Citations
  View Citations
- 1,252
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Probabilistic models for discovering e-communities

WWW '06: Proceedings of the 15th international conference on World Wide Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Detecting communities and their evolutions in dynamic social networks--a Bayesian approach

Mean-field variational approximate Bayesian inference for latent variable models

Online but accurate inference for latent variable models with local Gibbs sampling