research-article

Modeling reformulation using query distributions

Authors:
Xiaobing Xue

University of Massachusetts, Amherst, MA

University of Massachusetts, Amherst, MA
View Profile

,
W. Bruce Croft

University of Massachusetts, Amherst, MA

University of Massachusetts, Amherst, MA
View Profile

Authors Info & Claims

ACM Transactions on Information Systems Volume 31 Issue 2Article No.: 6pp 1–34https://doi.org/10.1145/2457465.2457466

Published:17 May 2013Publication History

ACM Transactions on Information Systems

Abstract

Query reformulation modifies the original query with the aim of better matching the vocabulary of the relevant documents, and consequently improving ranking effectiveness. Previous models typically generate words and phrases related to the original query, but do not consider how these words and phrases would fit together in actual queries. In this article, a novel framework is proposed that models reformulation as a distribution of actual queries, where each query is a variation of the original query. This approach considers an actual query as the basic unit and thus captures important query-level dependencies between words and phrases. An implementation of this framework that only uses publicly available resources is proposed, which makes fair comparisons with other methods using TREC collections possible. Specifically, this implementation consists of a query generation step that analyzes the passages containing query words to generate reformulated queries and a probability estimation step that learns a distribution for reformulated queries by optimizing the retrieval performance. Experiments on TREC collections show that the proposed model can significantly outperform previous reformulation models.

References

Bendersky, M. and Croft, W. B. 2008. Discovering key concepts in verbose queries. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'08). 491--498. Google ScholarDigital Library
Bendersky, M., Metzler, D., and Croft, W. B. 2010. Learning concept importance using a weighted dependence model. In Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM'10). 31--40. Google ScholarDigital Library
Bendersky, M., Smith, D. A., and Croft, W. B. 2009. Two-stage query segmentation for information retrieval. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'09). 810--811. Google ScholarDigital Library
Bergsma, S. and Wang, Q. I. 2007. Learning noun phrase query segmentation. In Proceedings of the Conference on Empirical Methods on Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL'07). 819--826.Google Scholar
Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., and Hullender, G. 2005. Learning to rank using gradient descent. In Proceedings of the International Conference on Machine Learning (ICML'05). 89--96. Google ScholarDigital Library
Byrd, R. H., Nocedal, J., and Schnabel, R. B. 1994. Rrepresentations of quasi-newton matrices and their use in limited memory methods. Math. Program. 63, 2, 129--156. Google ScholarDigital Library
Cao, G., Nie, J. Y., Gao, J., and Robertson, S. 2008. Selecting good expansion terms for pseudorelevance feedback. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'08). 243--250. Google ScholarDigital Library
Cao, Z., Qin, T., Liu, T.-Y., Tsai, M.-F., and Li, H. 2007. Learning to rank: from pairwise approach to listwise approach. In Proceedings of the International Conference on Machine Learning (ICML'07). ACM, 129--136. Google ScholarDigital Library
Collins-Thompson, K. 2008. Robust model estimation methods for information retrieval. Ph.D. thesis, Carnegie Mellon University.Google Scholar
Collins-Thompson, K. and Callan, J. 2007. Estimation and use of uncertainty in pseudo-relevance feedback. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'07). 303--310. Google ScholarDigital Library
Crouch, C. J. and Yang, B. 1992. Experiments in automatic statistical thesaurus construction. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'92). 77--88. Google ScholarDigital Library
Cummins, R., Lalmas, M., Oriordan, C., and Jose, J. 2011. Navigating the user query space. In Proceedings of the 18th International Conference on String Processing and Information Retrieval. Springer, 380--385. Google ScholarDigital Library
Dang, V. and Croft, W. B. 2010. Query reformulation using anchor text. In Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM'10). 41--50. Google ScholarDigital Library
Freund, Y., Iyer, R. D., Schapire, R. E., and Singer, Y. 2003. An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4, 933--969. Google ScholarDigital Library
Guo, J., Xu, G., Li, H., and Cheng, X. 2008. A unified and discriminative model for query refinement. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'08). 379--386. Google ScholarDigital Library
Herbrich, R., Graepel, T., and Obermayer, K. 2000. Large Margin Rank Boundaries for Ordinal Regression. MIT Press, Cambridge, MA.Google Scholar
Huang, J., Gao, J., Miao, J., Li, X., Wang, K., Behr, F., and Giles, C. L. 2010. Exploring web scale language models for search query processing. In Proceedings of the International Conference on World Wide Web (WWW'10). ACM, 451--460. Google ScholarDigital Library
Ide, E. 1971. New experiments in relevance feedback. In The SMART Retrieval System: Experiments in Automatic Document Processing, Prentice Hall, Englewood Cliffs, NJ.Google Scholar
Jones, R. and Fain, D. C. 2003. Query word deletion prediction. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'03). 435--436. Google ScholarDigital Library
Jones, R., Rey, B., Madani, O., and Greiner, W. 2006. Generating query substitutions. In Proceedings of the International Conference on World Wide Web (WWW'06). 387--396. Google ScholarDigital Library
Lafferty, J. and Zhai, C. 2001. Document language models, query models, and risk minimization for information retrieval. In Proceedings of the International Conference on Machine Learning (ICML'01). 111--119. Google ScholarDigital Library
Lang, H., Metzler, D., Wang, B., and Li, J.-T. 2010. Improved latent concept expansion using hierarchical Markov random fields. In Proceedings of the International Conference on Information and Knowledge Management (CIKM'10). 249--258. Google ScholarDigital Library
Lavrenko, V. and Croft, W. B. 2001. Relevance based language models. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'01). 120--127. Google ScholarDigital Library
Liu, X. and Croft, W. B. 2002. Passage retrieval based on language models. In Proceedings of the International Conference on Information and Knowledge Management (CIKM'02). 375--382. Google ScholarDigital Library
Lv, Y. and Zhai, C. 2010. Positional relevance model for pseudo-relevance feedback. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'10). 579--586. Google ScholarDigital Library
Lv, Y., Zhai, C., and Chen, W. 2011. A boosting approach to improving pseudo-relevance feedback. In Proceedings of the International Conference on Machine Learning (ICML'11). ACM, 165--174. Google ScholarDigital Library
Metzler, D. and Croft, W. B. 2004. Combining the language model and inference network approaches to retrieval. Inf. Process. Manage. 40, 5, 735--750. Google ScholarDigital Library
Metzler, D. and Croft, W. B. 2005. A Markov random field model for term dependencies. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'05). 472--479. Google ScholarDigital Library
Metzler, D. and Croft, W. B. 2007. Latent concept expansion using Markov random fields. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'07). 311--318. Google ScholarDigital Library
Peng, F., Ahmed, N., Li, X., and Lu, Y. 2007. Context sensitive stemming for web search. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'07). 639--646. Google ScholarDigital Library
Ponte, J. M. and Croft, W. B. 1998. A language modeling approach to information retrieval. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'98). 275--281. Google ScholarDigital Library
Porter, M. F. 1980. An algorithm for suffix stripping. Program 14, 3, 130--137.Google ScholarCross Ref
Qin, T., Liu, T.-Y., Zhang, X.-D., Wang, D.-S., Xiong, W.-Y., and Li, H. 2008. Learning to rank relational objects and its application to web search. In Proceedings of the International Conference on World Wide Web (WWW'08). 407--416. Google ScholarDigital Library
Qiu, Y. and Frei, H. P. 1993. Concept based query expansion. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'93). 160--169. Google ScholarDigital Library
Rocchio, J. J. 1971. Relevance feedback in information retrieval. In The SMART Retrieval System: Experiments in Automatic Document Processing, Prentice Hall, Englewood Cliffs, NJ.Google Scholar
Sheldon, D., Shokouhi, M., Szummer, M., and Craswell, N. 2011. Lambdamerge: merging the results of query reformulations. In Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM'11). 795--804. Google ScholarDigital Library
Soskin, N., Kurland, O., and Domshlak, C. 2009. Navigating in the dark: modeling uncertainty in ad hoc retrieval using multiple relevance models. In Proceedings of the International Conference on the Theory of Information Retrieval (ICTIR'09). 79--91. Google ScholarDigital Library
Svore, K. M., Kanani, P. H., and Khan, N. 2010. How good is a span of terms&quest;: Exploiting proximity to improve web retrieval. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'10). 155--161. Google ScholarDigital Library
Tan, B. and Peng, F. 2008. Unsupervised query segmentation using generative language models and Wikipedia. In Proceedings of the International Conference on World Wide Web (WWW'08). 347--356. Google ScholarDigital Library
Wang, L., Lin, J., and Metzler, D. 2010. Learning to efficiently rank. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'10). 138--145. Google ScholarDigital Library
Wang, X. and Zhai, C. 2008. Mining term association patterns from search logs for effective query reformulation. In Proceedings of the International Conference on Information and Knowledge Management (CIKM'08). 479--488. Google ScholarDigital Library
Xu, J. and Croft, W. B. 2000. Improving the effectiveness of information retrieval with local context analysis. ACM Trans. Inf. Syst. 18, 1, 79--112. Google ScholarDigital Library
Xu, J. and Li, H. 2007. Adarank: a boosting algorithm for information retrieval. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'07). 391--398. Google ScholarDigital Library
Xu, Y., Jones, G. J., and Wang, B. 2009. Query dependent pseudo-relevance feedback based on Wikipedia. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'09). 59--66. Google ScholarDigital Library
Xue, X. and Croft, W. B. 2010. Representing queries as distributions. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'10) Workshop on Query Representation and Understanding. 9--12.Google Scholar
Xue, X. and Croft, W. B. 2011. Modeling subset distributions for verbose queries. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'11). 1133--1134. Google ScholarDigital Library
Xue, X., Croft, W. B., and Smith, D. A. 2010. Modeling reformulation using passage analysis. In Proceedings of the International Conference on Information and Knowledge Management (CIKM'10). 1497--1500. Google ScholarDigital Library
Zhai, C. and Lafferty, J. 2001a. Model-based feedback in the language modeling approach to information retrieval. In Proceedings of the International Conference on Information and Knowledge Management (CIKM'01). 403--410. Google ScholarDigital Library
Zhai, C. and Lafferty, J. 2001b. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'01). 334--342. Google ScholarDigital Library

Index Terms

Modeling reformulation using query distributions
1. Information systems
  1. Information retrieval

Recommendations

Modeling reformulation using passage analysis
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management

Query reformulation modifies the original query with the aim of better matching the vocabulary of the relevant documents, and consequently improving ranking effectiveness. Previous techniques typically generate words and phrases related to the original ...
Read More
Query reformulation using anchor text
WSDM '10: Proceedings of the third ACM international conference on Web search and data mining

Query reformulation techniques based on query logs have been studied as a method of capturing user intent and improving retrieval effectiveness. The evaluation of these techniques has primarily, however, focused on proprietary query logs and selected ...
Read More
Modeling reformulation as query distributions
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Information Systems Volume 31, Issue 2
May 2013
180 pages
ISSN:1046-8188
EISSN:1558-2868
DOI:10.1145/2457465
Issue’s Table of Contents

Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 May 2013
- Accepted: 1 November 2012
- Revised: 1 June 2012
- Received: 1 November 2011
Published in tois Volume 31, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Query reformulation
information retrieval
passage analysis
query segmentation
query substitution
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 17
  Total Citations
  View Citations
- 607
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Modeling reformulation using query distributions

ACM Transactions on Information Systems

Abstract

References

Cited By

Index Terms

Recommendations

Modeling reformulation using passage analysis

Query reformulation using anchor text

Modeling reformulation as query distributions

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Modeling reformulation using query distributions

ACM Transactions on Information Systems

Abstract

References

Cited By

Index Terms

Recommendations

Modeling reformulation using passage analysis

Query reformulation using anchor text

Modeling reformulation as query distributions

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media