Domain-Specific Term Rankings Using Topic Models

Liu, Zhiyuan; Sun, Maosong

doi:10.1007/978-3-642-17187-1_44

Zhiyuan Liu²⁰ &
Maosong Sun²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6458))

Included in the following conference series:

Asia Information Retrieval Symposium

1443 Accesses
2 Citations

Abstract

A widely used approach for keyword extraction and content-based tag recommendation is ranking terms according to some statistical criteria. In many cases documents such as news articles and product reviews are in some specific domains. Domain knowledge may be important information for term rankings. In this paper, we present to model domain knowledge using latent topic models, referred to as Domain-Topic Model (DTM). Using DTM we perform domain-specific term rankings according to the relatedness between terms and domains. Experimental results on both keyword extraction and tag recommendation show advantages of DTM for domain-specific term rankings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Andrzejewski, D., Zhu, X., Craven, M.: Incorporating domain knowledge into topic modeling via dirichlet forest priors. In: Proceedings of ICML, pp. 25–32 (2009)
Google Scholar
Blei, D.M., McAuliffe, J.: Supervised topic models. In: Proceedings of NIPS, pp. 121–128 (2007)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
MATH Google Scholar
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 30, 1–7 (1998)
Article Google Scholar
Chemudugunta, C., Holloway, A., Smyth, P., Steyvers, M.: Modeling documents by combining semantic concepts with unsupervised statistical learning. In: Proceedings of ISWC, pp. 229–244 (2010)
Google Scholar
Cohn, D., Chang, H.: Learning to probabilistically identify authoritative documents. In: Proceedings of ICML, pp. 167–174 (2000)
Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. In: Proceedings of OSDI, pp. 137–150 (2004)
Google Scholar
Frank, E., Paynter, G., Witten, I., Gutwin, C., Nevill-Manning, C.: Domain-specific keyphrase extraction. In: Proceedings of IJCAI, vol. 16, pp. 668–673 (1999)
Google Scholar
Griffiths, T.L., Steyvers, M.: Finding scientific topics. PNAS 101, 5228–5235 (2004)
Article Google Scholar
Grineva, M., Grinev, M., Lizorkin, D.: Extracting key terms from noisy and multitheme documents. In: Proceedings of WWW, pp. 661–670 (2009)
Google Scholar
Heinrich, G.: Parameter estimation for text analysis. Tech. rep., Vsonix GmbH and University of Leipzig (2008)
Google Scholar
Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of EMNLP, pp. 216–223 (2003)
Google Scholar
Hulth, A., Karlgren, J., Jonsson, A., Bostrm, H., Asker, L.: Automatic Keyword Extraction Using Domain Knowledge. In: Gelbukh, A. (ed.) CICLing 2001. LNCS, vol. 2004, pp. 472–482. Springer, Heidelberg (2001)
Chapter Google Scholar
Katakis, I., Tsoumakas, G., Vlahavas, I.: Multilabel text classification for automated tag suggestion. In: ECML/PKDD Discovery Challenge 2008 (2008)
Google Scholar
Kleinberg, J.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)
Article MathSciNet MATH Google Scholar
Lacoste-Julien, S., Sha, F., Jordan, M.: Disclda: Discriminative learning for dimensionality reduction and classification. In: NIPS, pp. 897–904 (2008)
Google Scholar
Litvak, M., Last, M.: Graph-based keyword extraction for single-document summarization. In: Proceedings of the Workshop Multi-source Multilingual Information Extraction and Summarization, pp. 17–24 (2008)
Google Scholar
Liu, Z., Huang, W., Zheng, Y., Sun, M.: Extracting keyphrases via topic decomposition. In: Proceedings of EMNLP (2010)
Google Scholar
Liu, Z., Li, P., Zheng, Y., Sun, M.: Clustering to find exemplar terms for keyphrase extraction. In: Proceedings of EMNLP, pp. 257–266 (2009)
Google Scholar
Mihalcea, R., Tarau, P.: Textrank: Bringing order into texts. In: Proceedings of EMNLP, pp. 404–411 (2004)
Google Scholar
Mishne, G.: Autotag: a collaborative approach to automated tag assignment for weblog posts. In: Proceedings of WWW, pp. 953–954 (2006)
Google Scholar
Newman, D., Asuncion, A., Smyth, P., Welling, M.: Distributed inference for latent Dirichlet allocation. In: Proceedings of NIPS, pp. 1081–1088 (2007)
Google Scholar
Over, P., Liggett, W., Gilbert, H., Sakharov, A., Thatcher, M.: Introduction to duc-2001: An intrinsic evaluation of generic news text summarization systems. In: Proceedings of DUC 2001 (2001)
Google Scholar
Ramage, D., Hall, D., Nallapati, R., Manning, C.: Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of EMNLP, pp. 248–256 (2009)
Google Scholar
Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of UAI, pp. 487–494 (2004)
Google Scholar
Tatu, M., Srikanth, M., D’Silva, T.: RSDC 2008: Tag recommendations using bookmark content. ECML/PKDD Discovery Challenge (2008)
Google Scholar
Turney, P.D.: Learning algorithms for keyphrase extraction. Information Retrieval 2, 303–336 (2000)
Article Google Scholar
Wan, X., Xiao, J.: Collabrank: Towards a collaborative approach to single-document keyphrase extraction. In: Proceedings of COLING, pp. 969–976 (2008)
Google Scholar
Wan, X., Xiao, J.: Single document keyphrase extraction using neighborhood knowledge. In: Proceedings of AAAI, pp. 855–860 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, State Key Lab on Intelligent Technology and Systems, National Lab for Information Science and Technology, Tsinghua University, Beijing, 100084, China
Zhiyuan Liu & Maosong Sun

Authors

Zhiyuan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Maosong Sun
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Information Engineering, Roosevelt Road National Taiwan University, No. 1, Sec. 4, 10617, Taipei, Taiwan R.O.C.
Pu-Jen Cheng
School of Computing, National University of Singapore (NUS), Computing 1, 13 Computing Drive, 117417, Singapore
Min-Yen Kan
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong Shatin, N.T. Hong Kong, China
Wai Lam
School of Computing, Computing 1, National University of Singapore (NUS), 13 Computing Drive, 117417, Singapore
Preslav Nakov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, Z., Sun, M. (2010). Domain-Specific Term Rankings Using Topic Models. In: Cheng, PJ., Kan, MY., Lam, W., Nakov, P. (eds) Information Retrieval Technology. AIRS 2010. Lecture Notes in Computer Science, vol 6458. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17187-1_44

Download citation

DOI: https://doi.org/10.1007/978-3-642-17187-1_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17186-4
Online ISBN: 978-3-642-17187-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics