Topic Mining Based on Graph Local Clustering

Garza Villarreal, Sara Elena; Brena, Ramón F.

doi:10.1007/978-3-642-25330-0_18

Sara Elena Garza Villarreal²¹ &
Ramón F. Brena²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7095))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

950 Accesses
2 Citations

Abstract

This paper introduces an approach for discovering thematically related document groups (a topic mining task) in massive document collections with the aid of graph local clustering. This can be achieved by viewing a document collection as a directed graph where vertices represent documents and arcs represent connections among these (e.g. hyperlinks). Because a document is likely to have more connections to documents of the same theme, we have assumed that topics have the structure of a graph cluster, i.e. a group of vertices with more arcs to the inside of the group and fewer arcs to the outside of it. So, topics could be discovered by clustering the document graph; we use a local approach to cope with scalability. We also extract properties (keywords and most representative documents) from clusters to provide a summary of the topic. This approach was tested over the Wikipedia collection and we observed that the resulting clusters in fact correspond to topics, which shows that topic mining can be treated as a graph clustering problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Auer, S., Lehmann, J.: What Have Innsbruck and Leipzig in Common? Extracting Semantics from Wiki Content. In: Franconi, E., Kifer, M., May, W. (eds.) ESWC 2007. LNCS, vol. 4519, pp. 503–517. Springer, Heidelberg (2007)
Chapter Google Scholar
Chang, J., Boyd-Graber, J., Wang, C., Gerrish, S., Blei, D.M.: Reading tea leaves: How humans interpret topic models. In: Neural Information Processing Systems (2009)
Google Scholar
Chen, J., Zaiane, O.R., Goebel, R.: Detecting Communities in Large Networks by Iterative Local Expansion. In: International Conference on Computational Aspects of Social Networks 2009, pp. 105–112. IEEE (2009)
Google Scholar
Flake, G.W., Lawrence, S., Giles, C.L.: Efficient identification of Web communities. In: Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 150–160. ACM, New York (2000)
Chapter Google Scholar
Garza, S.E.: A Process for Extracting Groups of Thematically Related Documents in Encyclopedic Knowledge Web Collections by Means of a Pure Hyperlink-based Clustering Approach. PhD thesis, Instituto Tecnológico y de Estudios Superiores de Monterrey (2010)
Google Scholar
Garza, S.E., Brena, R.F.: Graph Local Clustering for Topic Detection in Web Collections. In: 2009 Latin American Web Congress, pp. 207–213. IEEE (2009)
Google Scholar
Gibson, D., Kumar, R., Tomkins, A.: Discovering large dense subgraphs in massive graphs. In: Proceedings of the 31st International Conference on Very Large Data Bases, pp. 721–732. VLDB Endowment (2005)
Google Scholar
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proceedings of the National Academy of Science USA 101(1), 5228–5235 (2004)
Article Google Scholar
He, X., Ding, C.H.Q., Zha, H., Simon, H.D.: Automatic topic identification using webpage clustering. In: Proceedings of the IEEE International Conference on Data Mining, pp. 195–202 (2001)
Google Scholar
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)
Article MathSciNet MATH Google Scholar
Lancichinetti, A., Fortunato, S., Kertész, J.: Detecting the overlapping and hierarchical community structure in complex networks. New Journal of Physics 11, 33015 (2009)
Article Google Scholar
Liu, Y., Niculescu-Mizil, A., Gryc, W.: Topic-link LDA: joint models of topic and author community. In: Proceedings of the 26th Annual International Conference on Machine Learning. ACM, New York (2009)
Google Scholar
Luo, F., Wang, J.Z., Promislow, E.: Exploring local community structures in large networks. Web Intelligence and Agent Systems 6(4), 387–400 (2008)
Google Scholar
Menczer, F.: Links tell us about lexical and semantic web content. CoRR, cs.IR/0108004 (2001)
Google Scholar
Milne, D., Witten, I.H.: Learning to link with Wikipedia. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, pp. 509–518. ACM, New York (2008)
Google Scholar
Modha, D.S., Spangler, W.S.: Clustering hypertext with applications to Web searching, US Patent App. 10/660,242 (September 11, 2003)
Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing and Management, 513–523 (1988)
Google Scholar
Schaeffer, S.E.: Stochastic Local Clustering for Massive Graphs. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 354–360. Springer, Heidelberg (2005)
Chapter Google Scholar
Schönhofen, P.: Identifying document topics using the Wikipedia category network. In: Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 456–462. IEEE Computer Society, Washington, DC, USA (2006)
Google Scholar
Stein, B., Zu Eissen, S.M.: Topic identification: Framework and application. In: Proceedings of the International Conference on Knowledge Management, vol. 399, pp. 522–531 (2004)
Google Scholar
Virtanen, S.E.: Clustering the Chilean Web. In: Proceedings of the 2003 First Latin American Web Congress, pp. 229–231 (2003)
Google Scholar
Wartena, C., Brussee, R.: Topic detection by clustering keywords. In: DEXA 2008: 19th International Conference on Database and Expert Systems Applications (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Universidad Autónoma de Nuevo León, San Nicolás de los Garza, NL, 66450, Mexico
Sara Elena Garza Villarreal
Tec de Monterrey, Monterrey, NL, 64849, Mexico
Ramón F. Brena

Authors

Sara Elena Garza Villarreal
View author publications
You can also search for this author in PubMed Google Scholar
Ramón F. Brena
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Mexican Petroleum Institute (IMP), Eje Central Lazaro Cardenas Norte, 152, Col. San Bartolo Atepehuacan, CP 07730, Mexico DF, Mexico
Ildar Batyrshin
National Polytechnic Institute (IPN), Center for Computing Research (CIC), Av. Juan Dios Bátiz, s/n, Col. Nueva Industrial Vallejo, CP 07738, Mexico D.F., Mexico
Grigori Sidorov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Garza Villarreal, S.E., Brena, R.F. (2011). Topic Mining Based on Graph Local Clustering. In: Batyrshin, I., Sidorov, G. (eds) Advances in Soft Computing. MICAI 2011. Lecture Notes in Computer Science(), vol 7095. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25330-0_18

Download citation

DOI: https://doi.org/10.1007/978-3-642-25330-0_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25329-4
Online ISBN: 978-3-642-25330-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics