Skip to main content

Topic Mining Based on Graph Local Clustering

  • Conference paper
Advances in Soft Computing (MICAI 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7095))

Included in the following conference series:

Abstract

This paper introduces an approach for discovering thematically related document groups (a topic mining task) in massive document collections with the aid of graph local clustering. This can be achieved by viewing a document collection as a directed graph where vertices represent documents and arcs represent connections among these (e.g. hyperlinks). Because a document is likely to have more connections to documents of the same theme, we have assumed that topics have the structure of a graph cluster, i.e. a group of vertices with more arcs to the inside of the group and fewer arcs to the outside of it. So, topics could be discovered by clustering the document graph; we use a local approach to cope with scalability. We also extract properties (keywords and most representative documents) from clusters to provide a summary of the topic. This approach was tested over the Wikipedia collection and we observed that the resulting clusters in fact correspond to topics, which shows that topic mining can be treated as a graph clustering problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Auer, S., Lehmann, J.: What Have Innsbruck and Leipzig in Common? Extracting Semantics from Wiki Content. In: Franconi, E., Kifer, M., May, W. (eds.) ESWC 2007. LNCS, vol. 4519, pp. 503–517. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  2. Chang, J., Boyd-Graber, J., Wang, C., Gerrish, S., Blei, D.M.: Reading tea leaves: How humans interpret topic models. In: Neural Information Processing Systems (2009)

    Google Scholar 

  3. Chen, J., Zaiane, O.R., Goebel, R.: Detecting Communities in Large Networks by Iterative Local Expansion. In: International Conference on Computational Aspects of Social Networks 2009, pp. 105–112. IEEE (2009)

    Google Scholar 

  4. Flake, G.W., Lawrence, S., Giles, C.L.: Efficient identification of Web communities. In: Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 150–160. ACM, New York (2000)

    Chapter  Google Scholar 

  5. Garza, S.E.: A Process for Extracting Groups of Thematically Related Documents in Encyclopedic Knowledge Web Collections by Means of a Pure Hyperlink-based Clustering Approach. PhD thesis, Instituto Tecnológico y de Estudios Superiores de Monterrey (2010)

    Google Scholar 

  6. Garza, S.E., Brena, R.F.: Graph Local Clustering for Topic Detection in Web Collections. In: 2009 Latin American Web Congress, pp. 207–213. IEEE (2009)

    Google Scholar 

  7. Gibson, D., Kumar, R., Tomkins, A.: Discovering large dense subgraphs in massive graphs. In: Proceedings of the 31st International Conference on Very Large Data Bases, pp. 721–732. VLDB Endowment (2005)

    Google Scholar 

  8. Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proceedings of the National Academy of Science USA 101(1), 5228–5235 (2004)

    Article  Google Scholar 

  9. He, X., Ding, C.H.Q., Zha, H., Simon, H.D.: Automatic topic identification using webpage clustering. In: Proceedings of the IEEE International Conference on Data Mining, pp. 195–202 (2001)

    Google Scholar 

  10. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  11. Lancichinetti, A., Fortunato, S., Kertész, J.: Detecting the overlapping and hierarchical community structure in complex networks. New Journal of Physics 11, 33015 (2009)

    Article  Google Scholar 

  12. Liu, Y., Niculescu-Mizil, A., Gryc, W.: Topic-link LDA: joint models of topic and author community. In: Proceedings of the 26th Annual International Conference on Machine Learning. ACM, New York (2009)

    Google Scholar 

  13. Luo, F., Wang, J.Z., Promislow, E.: Exploring local community structures in large networks. Web Intelligence and Agent Systems 6(4), 387–400 (2008)

    Google Scholar 

  14. Menczer, F.: Links tell us about lexical and semantic web content. CoRR, cs.IR/0108004 (2001)

    Google Scholar 

  15. Milne, D., Witten, I.H.: Learning to link with Wikipedia. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, pp. 509–518. ACM, New York (2008)

    Google Scholar 

  16. Modha, D.S., Spangler, W.S.: Clustering hypertext with applications to Web searching, US Patent App. 10/660,242 (September 11, 2003)

    Google Scholar 

  17. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing and Management, 513–523 (1988)

    Google Scholar 

  18. Schaeffer, S.E.: Stochastic Local Clustering for Massive Graphs. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 354–360. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  19. Schönhofen, P.: Identifying document topics using the Wikipedia category network. In: Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 456–462. IEEE Computer Society, Washington, DC, USA (2006)

    Google Scholar 

  20. Stein, B., Zu Eissen, S.M.: Topic identification: Framework and application. In: Proceedings of the International Conference on Knowledge Management, vol. 399, pp. 522–531 (2004)

    Google Scholar 

  21. Virtanen, S.E.: Clustering the Chilean Web. In: Proceedings of the 2003 First Latin American Web Congress, pp. 229–231 (2003)

    Google Scholar 

  22. Wartena, C., Brussee, R.: Topic detection by clustering keywords. In: DEXA 2008: 19th International Conference on Database and Expert Systems Applications (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Garza Villarreal, S.E., Brena, R.F. (2011). Topic Mining Based on Graph Local Clustering. In: Batyrshin, I., Sidorov, G. (eds) Advances in Soft Computing. MICAI 2011. Lecture Notes in Computer Science(), vol 7095. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25330-0_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-25330-0_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-25329-4

  • Online ISBN: 978-3-642-25330-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics