Multi-label classification using stacked hierarchical Dirichlet processes with reduced sampling complexity

420 Accesses
5 Citations
Explore all metrics

Abstract

Nonparametric topic models based on hierarchical Dirichlet processes (HDPs) allow for the number of topics to be automatically discovered from the data. The computational complexity of standard Gibbs sampling techniques for model training is linear in the number of topics. Recently, it was reduced to be linear in the number of topics per word using a technique called alias sampling combined with Metropolis Hastings (MH) sampling. We propose a different proposal distribution for the MH step based on the observation that distributions on the upper hierarchy level change slower than the document-specific distributions at the lower level. This reduces the sampling complexity, making it linear in the number of topics per document by using an approximation based on Metropolis–Hastings sampling. By utilizing a single global distribution, we are able to further improve the test set log-likelihood of this approximation. Furthermore, we propose a novel model of stacked HDPs utilizing this sampling method. An extensive analysis reveals the importance of the correct setting of hyperparameters for classification and shows the convergence properties of our method. Experiments demonstrate the effectiveness of the proposed approach in the context of multi-label classification as compared to previous Dependency-LDA models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on semi-supervised learning

Article Open access 15 November 2019

Jesper E. van Engelen & Holger H. Hoos

Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey

Article 28 November 2018

Hamed Jelodar, Yongli Wang, … Liang Zhao

Learning from positive and unlabeled data: a survey

Article 02 April 2020

Jessa Bekker & Jesse Davis

Notes

See Buntine and Hutter [3] for an efficient way to compute ratios of these numbers. They can be precomputed once and subsequently retrieved in O(1). Note that it may be necessary to store large values sparsely if the number of tokens in a restaurant becomes very large.
This improved method can also be applied if \(a>0\), i.e., we are dealing with a hierarchical Poisson–Dirichlet topic model. In this case, we need to divide q by \((b_1+M_d)\) and remultiply this factor when subtracting q from p.
see Papanikolaou et al. [15] for a formal justification of this approach.

References

Antoniak CE (1974) Mixtures of dirichlet processes with applications to bayesian nonparametric problems. Ann Stat 2(6):1152–1174
Article MathSciNet MATH Google Scholar
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(Jan):993–1022
MATH Google Scholar
Buntine W, Hutter M (2010) A Bayesian view of the Poisson–Dirichlet process. arXiv preprint arXiv:1007.0296
Buntine WL, Mishra S (2014) Experiments with non-parametric topic models. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’14, ACM, New York, NY, USA, pp 881–890
Burkhardt S, Kramer S (2017) Multi-label classification using stacked hierarchical dirichlet processes with reduced sampling complexity. In: ICBK 2017—international conference on big knowledge, IEEE, pp 1–8
Burkhardt S, Kramer S (2017) Online sparse collapsed hybrid variational-gibbs algorithm for hierarchical dirichlet process topic models. In: Ceci M, Hollmén J, Todorovski L, Vens C, Džeroski S (eds) Proceedings of ECML-PKDD 2017. Springer International Publishing, Cham, pp 189–204
Google Scholar
Chen C, Du L, Buntine W (2011) Sampling table configurations for the hierarchical Poisson–Dirichlet process. In: Gunopulos D, Hofmann T, Malerba D, Vazirgiannis M (eds) Proceedings of ECML-PKDD. Springer, Heidelberg, pp 296–311
Google Scholar
Katakis I, Tsoumakas G, Vlahavas I (2008) Multilabel text classification for automated tag suggestion. In: ECML-PKDD discovery challenge, vol 75
Lewis DD, Yang Y, Rose TG, Li F (2004) Rcv1: a new benchmark collection for text categorization research. J Mach Learn Res 5:361–397
Google Scholar
Li AQ, Ahmed A, Ravi S, Smola AJ (2014) Reducing the sampling complexity of topic models. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’14, ACM, New York, NY, USA, pp 891–900
Li C, Cheung WK, Ye Y, Zhang X, Chu D, Li X (2015) The author-topic-community model for author interest profiling and community discovery. Knowl Inf Syst 44(2):359–383
Article Google Scholar
Li W (2007) Pachinko allocation: DAG-structured mixture models of topic correlations. Ph.D. thesis, University of Massachusetts Amherst
Loza Mencía E, Fürnkranz J (2010) Efficient multilabel classification algorithms for large-scale problems in the legal domain. In: Francesconi E, Montemagni S, Peters W, Tiscornia D (eds) Semantic processing of legal texts—where the language of law meets the law of language. Lecture notes in artificial intelligence, vol 6036, 1st edn. Springer, pp 192–215
Nam J, Kim J, Loza Mencía E, Gurevych I, Fürnkranz J (2014) Large-scale multi-label text classification—revisiting neural networks. In: Calders T, Esposito F, Hüllermeier E, Meo R (eds) Proceedings of ECML-PKDD, part II. Springer, Heidelberg, pp 437–452
Google Scholar
Papanikolaou Y, Foulds JR, Rubin, TN, Tsoumakas G (2015) Dense distributions from sparse samples: improved Gibbs sampling parameter estimators for LDA. ArXiv e-prints
Prabhu Y, Varma M (2014) Fastxml: a fast, accurate and stable tree-classifier for extreme multi-label learning. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’14, ACM, New York, NY, USA, pp 263–272
Ramage D, Hall D, Nallapati R, Manning CD (2009) Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 conference on empirical methods in natural language processing: volume 1, EMNLP ’09, Association for Computational Linguistics, Stroudsburg, PA, USA, pp 248–256
Ramage D, Manning CD, Dumais S (2011) Partially labeled topic models for interpretable text mining. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’11, ACM, New York, NY, USA, pp 457–465
Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333–359
Article MathSciNet Google Scholar
Ren L, Dunson DB, Carin L (2008) The dynamic hierarchical Dirichlet process. In: Proceedings of the 25th ICML international conference on machine learning, ACM, pp 824–831
Rodríguez A, Dunson DB, Gelfand AE (2008) The nested Dirichlet process. J Am Stat Assoc 103(483):1131–1154
Article MathSciNet MATH Google Scholar
Rubin TN, Chambers A, Smyth P, Steyvers M (2012) Statistical topic models for multi-label document classification. Mach Learn 88(1–2):157–208
Article MathSciNet MATH Google Scholar
Salakhutdinov R, Tenenbaum JB, Torralba A (2013) Learning with hierarchical-deep models. IEEE Trans Pattern Anal Mach Intell 35(8):1958–1971
Article Google Scholar
Shimosaka M, Tsukiji T, Tominaga S, Tsubouchi K (2016) Coupled hierarchical Dirichlet process mixtures for simultaneous clustering and topic modeling. In: Frasconi P, Landwehr N, Manco G, Vreeken J (eds) Proceedings of ECML-PKDD. Springer International Publishing, Cham, pp 230–246
Google Scholar
Teh YW, Jordan MI, Beal MJ, Blei DM (2006) Hierarchical Dirichlet processes. J Am Stat Assoc 101(476):1566–1581
Article MathSciNet MATH Google Scholar
Tsoumakas G, Katakis I, Vlahavas IP (2008) Effective and efficient multilabel classification in domains with large number of labels. In: ECML/PKDD 2008 workshop on mining multidimensional data
Wood F, Archambeau C, Gasthaus J, James L, Teh YW (2009) A stochastic memoizer for sequence data. In: Proceedings of the 26th ICML international conference on machine learning, ACM, pp 1129–1136
Yen IEH, Huang X, Ravikumar P, Zhong K, Dhillon I (2016) Pd-sparse: a primal and dual sparse approach to extreme multiclass and multilabel classification. In: Proceedings of the 33rd international conference on machine learning, ACM, pp 3069–3077
Zuo Y, Zhao J, Xu K (2016) Word network topic model: a simple but general solution for short and imbalanced texts. Knowl Inf Syst 48(2):379–398
Article Google Scholar

Download references

Acknowledgements

We thank the anonymous reviewers for useful comments and suggestions, Jinseok Nam for providing the source code of the neural network classifier and Andrey Tyukin for helpful discussions on Stirling numbers. The first author was supported by a scholarship from PRIME Research, Mainz.

Author information

Authors and Affiliations

Institute of Computer Science, Johannes Gutenberg-University of Mainz, Staudingerweg 9, 55128, Mainz, Germany
Sophie Burkhardt & Stefan Kramer

Authors

Sophie Burkhardt
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Kramer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sophie Burkhardt.

Additional information

This paper is an extended version of an ICBK 2017 conference paper [5]

Rights and permissions

Reprints and permissions

About this article

Cite this article

Burkhardt, S., Kramer, S. Multi-label classification using stacked hierarchical Dirichlet processes with reduced sampling complexity. Knowl Inf Syst 59, 93–115 (2019). https://doi.org/10.1007/s10115-018-1204-z

Download citation

Received: 30 September 2017
Revised: 16 February 2018
Accepted: 06 May 2018
Published: 19 May 2018
Issue Date: 04 April 2019
DOI: https://doi.org/10.1007/s10115-018-1204-z

Multi-label classification using stacked hierarchical Dirichlet processes with reduced sampling complexity

Abstract

Access this article

Similar content being viewed by others

A survey on semi-supervised learning

Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey

Learning from positive and unlabeled data: a survey

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-label classification using stacked hierarchical Dirichlet processes with reduced sampling complexity

Abstract

Access this article

Similar content being viewed by others

A survey on semi-supervised learning

Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey

Learning from positive and unlabeled data: a survey

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation