column

Document Language Models, Query Models, and Risk Minimization for Information Retrieval

Authors:
John Lafferty

Carnegie Mellon University, Pittsburgh, PA

Carnegie Mellon University, Pittsburgh, PA
View Profile

,
Chengxiang Zhai

Carnegie Mellon University, Pittsburgh, PA

Carnegie Mellon University, Pittsburgh, PA
View Profile

Authors Info & Claims

ACM SIGIR Forum Volume 51 Issue 2July 2017pp 251–259https://doi.org/10.1145/3130348.3130375

Published:02 August 2017Publication History

ACM SIGIR Forum

Abstract

We present a framework for information retrieval that combines document models and query models using a probabilistic ranking function based on Bayesian decision theory. The framework suggests an operational retrieval model that extends recent developments in the language modeling approach to information retrieval. A language model for each document is estimated, as well as a language model for each query, and the retrieval problem is cast in terms of risk minimization. The query language model can be exploited to model user preferences, the context of a query, synonomy and word senses. While recent work has incorporated word translation models for this purpose, we introduce a new method using Markov chains defined on a set of documents to estimate the query models. The Markov chain method has connections to algorithms from link analysis and social networks. The new approach is evaluated on TREC collections and compared to the basic language modeling approach and vector space models together with query expansion using Rocchio. Significant improvements are obtained over standard query expansion methods for strong baseline TF-IDF systems, with the greatest improvements attained for short queries on Web data.

References

A. Berger and J. Lafferty. Information retrieval as statistical translation. In Proceedings of the 1999 ACM SIGIR Conference on Research and Development in Information Retrieval, pages 222--229, 1999. Google ScholarDigital Library
A. Bookstein and D. Swanson. A decision theoretic foundation for indexing. Journal for the American Society for Information Science, pages 45--50, 1975. Google ScholarCross Ref
A. Bookstein and D. Swanson. Probabilistic models for automatic indexing. Journal for the American Society for Information Science, 25(5):312--318, 1976. Google ScholarCross Ref
S. Brin and L. Page. Anatomy of a large-scale hypertextual web search engine. In Proceedings of the 7th International World Wide Web Conference, 1998. Google ScholarDigital Library
H. Hubbell C. An input-output approach to clique identification. Sociometry, 28:377--399, 1965. Google ScholarCross Ref
J. G. Carbonell, Y. Geng, and J. Goldstein. Automated query-relevant summarization and diversity-based reranking. In IJCAI-97 Workshop on AI and Digital Libraries, 1997.Google Scholar
W. S. Cooper and M. E. Maron. Foundations of probabilistic and utility-theoretic indexing. Journal of the Association for Computing Machinery, 25(1):67--80, 1978. Google ScholarDigital Library
W. B. Croft and D.J. Harper. Using probabilistic models of document retrieval without relevance information. Journal of Documentation, 35:285--295, 1979. Google ScholarCross Ref
S. Deerwester, S. Dumais, T. Landauer, G. Furnas, and R. Harshman. Indexing by latent semantic analysis. Journal of American Society for Information Science, 41:391--407, 1990. Google ScholarCross Ref
N. Fuhr. Probabilistic models in information retrieval. The Computer Journal, 35(3):243--255, 1992. Google ScholarDigital Library
D. Hiemstra andW. Kraaij. Twenty-one at TREC-7: Ad-hoc and cross-language track. In Proc. of Seventh Text REtrieval Conference (TREC-7), 1998.Google Scholar
L. Katz. A new status index derived from sociometric analysis. Psychometrika, 18:39--43, 1953. Google ScholarCross Ref
J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the Association for Computing Machinery, 46, 1999. Google ScholarDigital Library
J. Lafferty and C. Zhai. Probabilistic IR models based on query and document generation. In Proceedings of the Workshop on Language Modeling and Information Retrieval, Carnegie Mellon University, May 31-June 1, 2001.Google Scholar
D. H. Miller, T. Leek, and R. Schwartz. A hidden Markov model information retrieval system. In Proceedings of the 1999 ACM SIGIR Conference on Research and Development in Information Retrieval, pages 214--221, 1999. Google ScholarDigital Library
F. Mosteller and D. Wallace. Inference and disputed authorship: The Federalist. Addison Wesley, 1964.Google Scholar
J. Ponte and W. B. Croft. A language modeling approach to information retrieval. In Proceedings of the ACM SIGIR, pages 275--281, 1998. Google ScholarDigital Library
S. Robertson and K. Sparck Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27:129--146, 1976. Google ScholarCross Ref
S. E. Robertson, S. Walker, S. Jones, M. M.Hancock- Beaulieu, and M. Gatford. Okapi at TREC-3. In D. K. Harman, editor, The Third Text REtrieval Conference (TREC-3), 1995.Google Scholar
I. Witten, A. Moffat, and T. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann, 1999.Google ScholarDigital Library
S. K. M. Wong and Y. Y. Yao. A probability distribution model for information retrieval. Information Processing and Management, 25(1):39--53, 1989. Google ScholarDigital Library

Index Terms

Document Language Models, Query Models, and Risk Minimization for Information Retrieval

Index terms have been assigned to the content through auto-classification.

Recommendations

Document language models, query models, and risk minimization for information retrieval
SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval

We present a framework for information retrieval that combines document models and query models using a probabilistic ranking function based on Bayesian decision theory. The framework suggests an operational retrieval model that extends recent ...
Read More
Statistical query translation models for cross-language information retrieval

Query translation is an important task in cross-language information retrieval (CLIR), which aims to determine the best translation words and weights for a query. This article presents three statistical query translation models that focus on the ...
Read More
Query expansion using term relationships in language models for information retrieval
CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge management

Language Modeling (LM) has been successfully applied to Information Retrieval (IR). However, most of the existing LM approaches only rely on term occurrences in documents, queries and document collections. In traditional unigram based models, terms (or ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGIR Forum Volume 51, Issue 2
SIGIR Test-of-Time Awardees 1978-2001
July 2017
276 pages
ISSN:0163-5840
DOI:10.1145/3130348
Editors:
Donna Harman
National Institutes of Science & Technology, Gaithersburg MD, USA
,
Diane Kelly
University of Tennessee, Knoxville TN, USA
Issue’s Table of Contents
Copyright © 2017 Copyright is held by the owner/author(s)
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 August 2017
Check for updates
Qualifiers
- column
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 10
  Total Citations
  View Citations
- 297
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Document Language Models, Query Models, and Risk Minimization for Information Retrieval

ACM SIGIR Forum

Abstract

References

Cited By

Index Terms

Recommendations

Document language models, query models, and risk minimization for information retrieval

Statistical query translation models for cross-language information retrieval

Query expansion using term relationships in language models for information retrieval

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Document Language Models, Query Models, and Risk Minimization for Information Retrieval

ACM SIGIR Forum

Abstract

References

Cited By

Index Terms

Recommendations

Document language models, query models, and risk minimization for information retrieval

Statistical query translation models for cross-language information retrieval

Query expansion using term relationships in language models for information retrieval

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media