Skip to main content

Wikipedia-Based Semantic Smoothing for the Language Modeling Approach to Information Retrieval

  • Conference paper
Advances in Information Retrieval (ECIR 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5993))

Included in the following conference series:

Abstract

Semantic smoothing for the language modeling approach to information retrieval is significant and effective to improve retrieval performance. In previous methods such as the translation model, individual terms or phrases are used to do semantic mapping. These models are not very efficient when faced with ambiguous words and phrases because they are unable to incorporate contextual information. To overcome this limitation, we propose a novel Wikipedia-based semantic smoothing method that decomposes a document into a set of weighted Wikipedia concepts and then maps those unambiguous Wikipedia concepts into query terms. The mapping probabilities from each Wikipedia concept to individual terms are estimated through the EM algorithm. Document models based on Wikipedia concept mapping are then derived. The new smoothing method is evaluated on the TREC Ad Hoc Track (Disks 1, 2, and 3) collections. Experiments show significant improvements over the two-stage language model, as well as the language model with translation-based semantic smoothing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Berger, A., Lafferty, J.: Information Retrieval as Statistical Translation. In: Proc. 22nd Ann. Int’l ACM Conf. Research and Development in Information Retrieval (SIGIR 1999), pp. 222–229 (1999)

    Google Scholar 

  2. Cao, G., Nie, J.Y., Bai, J.: Integrating Word Relationships into Language Models. In: Proc. 28th Ann. Int’l ACM Conf. Research and Development in Information Retrieval (SIGIR 2005), pp. 298–305 (2005)

    Google Scholar 

  3. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum Likelihood from Incomplete Data via the EM Algorithm. J. Royal Statistical Soc. 39, 1–38 (1977)

    MATH  MathSciNet  Google Scholar 

  4. Gabrilovich, E., Markovitch, S.: Feature generation for text categorization using world knowledge. In: International Joint Conference on Artificial Intelligence, Edinburgh, Scotland (2005)

    Google Scholar 

  5. Gabrilovich, E., Markovitch, S.: Overcoming the brittleness bottleneck using wikipedia: enhancing text categorization with encyclopedic knowledge. In: National Conference on Artificial Intelligence (AAAI), Boston, Massachusetts (2006)

    Google Scholar 

  6. Gabrilovich, E., Markovitch, S.: Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 6–12 (2007)

    Google Scholar 

  7. Jin, R., Hauptmann, A., Zhai, C.: Title Language Model for Information Retrieval. In: Proc. 25th Ann. Int’l ACM Conf. Research and Development in Information Retrieval (SIGIR 2002), pp. 42–48 (2002)

    Google Scholar 

  8. Lafferty, J., Zhai, C.: Document Language Models, Query Models, and Risk Minimization for Information Retrieval. In: Proc. 24th Ann. Int’l ACM Conf. Research and Development in Information Retrieval (SIGIR 2001), pp. 111–119 (2001)

    Google Scholar 

  9. Liu, X., Croft, W.B.: Cluster-Based Retrieval Using Language Models. In: Proc. 24th Ann. Int’l ACM Conf. Research and Development in Information Retrieval (SIGIR 2001), pp. 186–193 (2001)

    Google Scholar 

  10. Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: Proceedings of the Twenty First ACM-SIGIR, Melbourne, Australia, pp. 275–281. ACM Press, New York (1998)

    Google Scholar 

  11. Wang, P., et al.: Improving Text Classification by Using Encyclopedia Knowledge. In: Seventh IEEE International Conference on Data Mining, pp. 332–341 (2007)

    Google Scholar 

  12. Wang, P., Domeniconi, C.: Building semantic kernels for text classification using Wikipedia. In: The 14th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2008 (2008)

    Google Scholar 

  13. Zhou, X., Hu, X., et al.: Topic Signature Language Models for Ad Hoc Retrieval. IEEE Transactions on Knowledge and Data Engineering 19(9), 1276–1287 (2007)

    Article  Google Scholar 

  14. Zhai, C., Lafferty, J.: A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval. In: Proc. 24th Ann. Int’l ACM Conf. Research and Development in Information Retrieval (SIGIR 2001), pp. 334–342 (2001)

    Google Scholar 

  15. Zhai, C., Lafferty, J.: Two-Stage Language Models for Information Retrieval. In: Proc. ACM Conf. Research and Development in Information Retrieval, SIGIR 2002 (2002)

    Google Scholar 

  16. Zhai, C., Lafferty, J.: Model-Based Feedback in the Language Modeling Approach to Information Retrieval. In: Proc. 10th Int’l Conf. Information and Knowledge Management (CIKM 2001), pp. 403–410 (2001)

    Google Scholar 

  17. Van Rijsbergen, C.J.: Information Retrieval, 2nd edn. Dept. of Computer Science, University of Glasgow (1979)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tu, X., He, T., Chen, L., Luo, J., Zhang, M. (2010). Wikipedia-Based Semantic Smoothing for the Language Modeling Approach to Information Retrieval. In: Gurrin, C., et al. Advances in Information Retrieval. ECIR 2010. Lecture Notes in Computer Science, vol 5993. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12275-0_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12275-0_33

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12274-3

  • Online ISBN: 978-3-642-12275-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics