Making Topic Words Distribution More Accurate and Ranking Topic Significance According to the Jensen-Shannon Divergence from Background Topic

Fujino, Iwao; Hoshino, Yuko

doi:10.1007/978-3-319-20910-4_14

Making Topic Words Distribution More Accurate and Ranking Topic Significance According to the Jensen-Shannon Divergence from Background Topic

Iwao Fujino⁵ &
Yuko Hoshino⁵

Conference paper
First Online: 01 January 2015

1443 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9165))

Abstract

This paper presents a useful approach for making topic words distribution more accurate and ranking topic significance according to the Jensen-Shannon divergence from background topic as a post-procedure of LDA method. In this paper, at first we defined the term score parameter to represent topics that will suppress the correlation between different topics and make the word distribution more accurate. Then according to the correlation between different topics, we described a concrete method for determining the proper setting of the number of topics. After that we proposed a method for ranking topic significance in the order of the Jensen-Shannon divergence from background topic. As a confirmation of our proposed methods, we conducted several experiments to processing English Twitter streaming data. The results of these experiments validate that our methods work efficiently as expected.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
http://www.cs.princeton.edu/~blei/lda-c/index.html.

References

Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
AlSumait, L., Barbará, D., Gentle, J., Domeniconi, C.: Topic significance ranking of LDA generative Models. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009, Part I. LNCS, vol. 5781, pp. 67–82. Springer, Heidelberg (2009)
Chapter Google Scholar
Wang, L., Wei, B., Yuan, J.: Topic discovery based on LDA_col model and topic sinificance re-ranking. J. Comput. 6(8), 1639–1647 (2011)
Google Scholar
Hofmann, T. : Probabilistic latent sematic analysis. In: UAI, pp. 289–296 (1999)
Google Scholar
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval – the Concepts and Technology behind Search, 2nd edn. Pearson Education Limited, Harlow (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information and Telecommunication Engineering, Tokai University, Tokyo, Japan
Iwao Fujino & Yuko Hoshino

Authors

Iwao Fujino
View author publications
You can also search for this author in PubMed Google Scholar
Yuko Hoshino
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Iwao Fujino .

Editor information

Editors and Affiliations

IBaI, Leipzig, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fujino, I., Hoshino, Y. (2015). Making Topic Words Distribution More Accurate and Ranking Topic Significance According to the Jensen-Shannon Divergence from Background Topic. In: Perner, P. (eds) Advances in Data Mining: Applications and Theoretical Aspects. ICDM 2015. Lecture Notes in Computer Science(), vol 9165. Springer, Cham. https://doi.org/10.1007/978-3-319-20910-4_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-20910-4_14
Published: 20 June 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20909-8
Online ISBN: 978-3-319-20910-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics