Heuristic Pretraining for Topic Models

Masada, Tomonari; Takasu, Atsuhiro

doi:10.1007/978-3-319-19066-2_34

Heuristic Pretraining for Topic Models

Tomonari Masada⁹ &
Atsuhiro Takasu¹⁰

Conference paper
First Online: 01 January 2015

2680 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9101))

Abstract

This paper provides a heuristic pretraining for topic models. While we consider latent Dirichlet allocation (LDA) here, our pretraining can be applied to other topic models. Basically, we use collapsed Gibbs sampling (CGS) to update the latent variables. However, after every iteration of CGS, we regard the latent variables as observable and construct another LDA over them, which we call LDA over LDA (LoL). We then perform the following two types of updates: the update of the latent variables in LoL by CGS and the update of the latent variables in LDA based on the result of the preceding update of the latent variables in LoL. We perform one iteration of CGS for LDA and the above two types of updates alternately only for a small, earlier part of the inference. That is, the proposed method is used as a pretraining. The pretraining stage is followed by the usual iterations of CGS for LDA. The evaluation experiment shows that our pretraining can improve test set perplexity.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Asuncion, A., Welling, M., Smyth, P., Teh, Y.W.: On smoothing and inference for topic models. In: UAI 2009, pp. 27–34 (2009)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. JMLR 3, 993–1022 (2003)
MATH Google Scholar
Chen, B., Polatkan, G., Sapiro, G., Blei, D., Dunson, D., Carin, L.: Deep learning with hierarchical convolutional factor analysis. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1887–1901 (2013)
Article Google Scholar
Gerrish, S., Blei, D.M.: A language-based approach to measuring scholarly impact. In: ICML 2010, pp. 375–382 (2010)
Google Scholar
Griffiths, T.L., Steyvers, M.: Finding scientific topics. PNAS 101(suppl. 1), 5228–5235 (2004)
Article Google Scholar
Minka, T.P., Lafferty, J.: Expectation-propagation for the generative aspect model. In: UAI 2002, pp. 352–359 (2002)
Google Scholar
Minka, T.P.: Estimating a Dirichlet distribution (2000). http://research.microsoft.com/en-us/um/people/minka/papers/dirichlet/
Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical Dirichlet processes. J. Amer. Statist. Assoc. 101, 1566–1581 (2006)
Article MATH MathSciNet Google Scholar
Teh, Y.W., Newman, D., Welling., M.: A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. In: Advances in Neural Information Processing Systems, vol. 19 (2007)
Google Scholar
Zhao, W.X., Jiang, J., He, J., Song, Y., Achananuparp, P., Lim, E.P., Li., X.: Topical keyphrase extraction from Twitter. In: HLT 2011, pp. 379–388 (2011)
Google Scholar
Zhao, W.X., Jiang, J., Weng, J., He, J., Lim, E.-P., Yan, H., Li, X.: Comparing Twitter and traditional media using topic models. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 338–349. Springer, Heidelberg (2011)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Nagasaki University, 1-14 Bunkyo-machi, Nagasaki, 8528521, Japan
Tomonari Masada
National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, 1018430, Japan
Atsuhiro Takasu

Authors

Tomonari Masada
View author publications
You can also search for this author in PubMed Google Scholar
Atsuhiro Takasu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tomonari Masada .

Editor information

Editors and Affiliations

Texas State University, San Marcos, Texas, USA
Moonis Ali
Dongguk University, Seoul, Korea, Republic of (South Korea)
Young Sig Kwon
Dongguk University, Seoul, Korea, Republic of (South Korea)
Chang-Hwan Lee
Dongguk University, Seoul, Korea, Republic of (South Korea)
Juntae Kim
Seoul National University, Seoul, Korea, Republic of (South Korea)
Yongdai Kim

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Masada, T., Takasu, A. (2015). Heuristic Pretraining for Topic Models. In: Ali, M., Kwon, Y., Lee, CH., Kim, J., Kim, Y. (eds) Current Approaches in Applied Artificial Intelligence. IEA/AIE 2015. Lecture Notes in Computer Science(), vol 9101. Springer, Cham. https://doi.org/10.1007/978-3-319-19066-2_34

Download citation

DOI: https://doi.org/10.1007/978-3-319-19066-2_34
Published: 01 May 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19065-5
Online ISBN: 978-3-319-19066-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics