Skip to main content

Heuristic Pretraining for Topic Models

  • Conference paper
  • First Online:
  • 2680 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9101))

Abstract

This paper provides a heuristic pretraining for topic models. While we consider latent Dirichlet allocation (LDA) here, our pretraining can be applied to other topic models. Basically, we use collapsed Gibbs sampling (CGS) to update the latent variables. However, after every iteration of CGS, we regard the latent variables as observable and construct another LDA over them, which we call LDA over LDA (LoL). We then perform the following two types of updates: the update of the latent variables in LoL by CGS and the update of the latent variables in LDA based on the result of the preceding update of the latent variables in LoL. We perform one iteration of CGS for LDA and the above two types of updates alternately only for a small, earlier part of the inference. That is, the proposed method is used as a pretraining. The pretraining stage is followed by the usual iterations of CGS for LDA. The evaluation experiment shows that our pretraining can improve test set perplexity.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Asuncion, A., Welling, M., Smyth, P., Teh, Y.W.: On smoothing and inference for topic models. In: UAI 2009, pp. 27–34 (2009)

    Google Scholar 

  2. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. JMLR 3, 993–1022 (2003)

    MATH  Google Scholar 

  3. Chen, B., Polatkan, G., Sapiro, G., Blei, D., Dunson, D., Carin, L.: Deep learning with hierarchical convolutional factor analysis. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1887–1901 (2013)

    Article  Google Scholar 

  4. Gerrish, S., Blei, D.M.: A language-based approach to measuring scholarly impact. In: ICML 2010, pp. 375–382 (2010)

    Google Scholar 

  5. Griffiths, T.L., Steyvers, M.: Finding scientific topics. PNAS 101(suppl. 1), 5228–5235 (2004)

    Article  Google Scholar 

  6. Minka, T.P., Lafferty, J.: Expectation-propagation for the generative aspect model. In: UAI 2002, pp. 352–359 (2002)

    Google Scholar 

  7. Minka, T.P.: Estimating a Dirichlet distribution (2000). http://research.microsoft.com/en-us/um/people/minka/papers/dirichlet/

  8. Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical Dirichlet processes. J. Amer. Statist. Assoc. 101, 1566–1581 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  9. Teh, Y.W., Newman, D., Welling., M.: A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. In: Advances in Neural Information Processing Systems, vol. 19 (2007)

    Google Scholar 

  10. Zhao, W.X., Jiang, J., He, J., Song, Y., Achananuparp, P., Lim, E.P., Li., X.: Topical keyphrase extraction from Twitter. In: HLT 2011, pp. 379–388 (2011)

    Google Scholar 

  11. Zhao, W.X., Jiang, J., Weng, J., He, J., Lim, E.-P., Yan, H., Li, X.: Comparing Twitter and traditional media using topic models. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 338–349. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tomonari Masada .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Masada, T., Takasu, A. (2015). Heuristic Pretraining for Topic Models. In: Ali, M., Kwon, Y., Lee, CH., Kim, J., Kim, Y. (eds) Current Approaches in Applied Artificial Intelligence. IEA/AIE 2015. Lecture Notes in Computer Science(), vol 9101. Springer, Cham. https://doi.org/10.1007/978-3-319-19066-2_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-19066-2_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-19065-5

  • Online ISBN: 978-3-319-19066-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics