Abstract
Query reformulation modifies the original query with the aim of better matching the vocabulary of the relevant documents, and consequently improving ranking effectiveness. Previous models typically generate words and phrases related to the original query, but do not consider how these words and phrases would fit together in actual queries. In this article, a novel framework is proposed that models reformulation as a distribution of actual queries, where each query is a variation of the original query. This approach considers an actual query as the basic unit and thus captures important query-level dependencies between words and phrases. An implementation of this framework that only uses publicly available resources is proposed, which makes fair comparisons with other methods using TREC collections possible. Specifically, this implementation consists of a query generation step that analyzes the passages containing query words to generate reformulated queries and a probability estimation step that learns a distribution for reformulated queries by optimizing the retrieval performance. Experiments on TREC collections show that the proposed model can significantly outperform previous reformulation models.
- Bendersky, M. and Croft, W. B. 2008. Discovering key concepts in verbose queries. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'08). 491--498. Google ScholarDigital Library
- Bendersky, M., Metzler, D., and Croft, W. B. 2010. Learning concept importance using a weighted dependence model. In Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM'10). 31--40. Google ScholarDigital Library
- Bendersky, M., Smith, D. A., and Croft, W. B. 2009. Two-stage query segmentation for information retrieval. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'09). 810--811. Google ScholarDigital Library
- Bergsma, S. and Wang, Q. I. 2007. Learning noun phrase query segmentation. In Proceedings of the Conference on Empirical Methods on Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL'07). 819--826.Google Scholar
- Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., and Hullender, G. 2005. Learning to rank using gradient descent. In Proceedings of the International Conference on Machine Learning (ICML'05). 89--96. Google ScholarDigital Library
- Byrd, R. H., Nocedal, J., and Schnabel, R. B. 1994. Rrepresentations of quasi-newton matrices and their use in limited memory methods. Math. Program. 63, 2, 129--156. Google ScholarDigital Library
- Cao, G., Nie, J. Y., Gao, J., and Robertson, S. 2008. Selecting good expansion terms for pseudorelevance feedback. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'08). 243--250. Google ScholarDigital Library
- Cao, Z., Qin, T., Liu, T.-Y., Tsai, M.-F., and Li, H. 2007. Learning to rank: from pairwise approach to listwise approach. In Proceedings of the International Conference on Machine Learning (ICML'07). ACM, 129--136. Google ScholarDigital Library
- Collins-Thompson, K. 2008. Robust model estimation methods for information retrieval. Ph.D. thesis, Carnegie Mellon University.Google Scholar
- Collins-Thompson, K. and Callan, J. 2007. Estimation and use of uncertainty in pseudo-relevance feedback. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'07). 303--310. Google ScholarDigital Library
- Crouch, C. J. and Yang, B. 1992. Experiments in automatic statistical thesaurus construction. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'92). 77--88. Google ScholarDigital Library
- Cummins, R., Lalmas, M., Oriordan, C., and Jose, J. 2011. Navigating the user query space. In Proceedings of the 18th International Conference on String Processing and Information Retrieval. Springer, 380--385. Google ScholarDigital Library
- Dang, V. and Croft, W. B. 2010. Query reformulation using anchor text. In Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM'10). 41--50. Google ScholarDigital Library
- Freund, Y., Iyer, R. D., Schapire, R. E., and Singer, Y. 2003. An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4, 933--969. Google ScholarDigital Library
- Guo, J., Xu, G., Li, H., and Cheng, X. 2008. A unified and discriminative model for query refinement. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'08). 379--386. Google ScholarDigital Library
- Herbrich, R., Graepel, T., and Obermayer, K. 2000. Large Margin Rank Boundaries for Ordinal Regression. MIT Press, Cambridge, MA.Google Scholar
- Huang, J., Gao, J., Miao, J., Li, X., Wang, K., Behr, F., and Giles, C. L. 2010. Exploring web scale language models for search query processing. In Proceedings of the International Conference on World Wide Web (WWW'10). ACM, 451--460. Google ScholarDigital Library
- Ide, E. 1971. New experiments in relevance feedback. In The SMART Retrieval System: Experiments in Automatic Document Processing, Prentice Hall, Englewood Cliffs, NJ.Google Scholar
- Jones, R. and Fain, D. C. 2003. Query word deletion prediction. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'03). 435--436. Google ScholarDigital Library
- Jones, R., Rey, B., Madani, O., and Greiner, W. 2006. Generating query substitutions. In Proceedings of the International Conference on World Wide Web (WWW'06). 387--396. Google ScholarDigital Library
- Lafferty, J. and Zhai, C. 2001. Document language models, query models, and risk minimization for information retrieval. In Proceedings of the International Conference on Machine Learning (ICML'01). 111--119. Google ScholarDigital Library
- Lang, H., Metzler, D., Wang, B., and Li, J.-T. 2010. Improved latent concept expansion using hierarchical Markov random fields. In Proceedings of the International Conference on Information and Knowledge Management (CIKM'10). 249--258. Google ScholarDigital Library
- Lavrenko, V. and Croft, W. B. 2001. Relevance based language models. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'01). 120--127. Google ScholarDigital Library
- Liu, X. and Croft, W. B. 2002. Passage retrieval based on language models. In Proceedings of the International Conference on Information and Knowledge Management (CIKM'02). 375--382. Google ScholarDigital Library
- Lv, Y. and Zhai, C. 2010. Positional relevance model for pseudo-relevance feedback. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'10). 579--586. Google ScholarDigital Library
- Lv, Y., Zhai, C., and Chen, W. 2011. A boosting approach to improving pseudo-relevance feedback. In Proceedings of the International Conference on Machine Learning (ICML'11). ACM, 165--174. Google ScholarDigital Library
- Metzler, D. and Croft, W. B. 2004. Combining the language model and inference network approaches to retrieval. Inf. Process. Manage. 40, 5, 735--750. Google ScholarDigital Library
- Metzler, D. and Croft, W. B. 2005. A Markov random field model for term dependencies. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'05). 472--479. Google ScholarDigital Library
- Metzler, D. and Croft, W. B. 2007. Latent concept expansion using Markov random fields. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'07). 311--318. Google ScholarDigital Library
- Peng, F., Ahmed, N., Li, X., and Lu, Y. 2007. Context sensitive stemming for web search. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'07). 639--646. Google ScholarDigital Library
- Ponte, J. M. and Croft, W. B. 1998. A language modeling approach to information retrieval. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'98). 275--281. Google ScholarDigital Library
- Porter, M. F. 1980. An algorithm for suffix stripping. Program 14, 3, 130--137.Google ScholarCross Ref
- Qin, T., Liu, T.-Y., Zhang, X.-D., Wang, D.-S., Xiong, W.-Y., and Li, H. 2008. Learning to rank relational objects and its application to web search. In Proceedings of the International Conference on World Wide Web (WWW'08). 407--416. Google ScholarDigital Library
- Qiu, Y. and Frei, H. P. 1993. Concept based query expansion. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'93). 160--169. Google ScholarDigital Library
- Rocchio, J. J. 1971. Relevance feedback in information retrieval. In The SMART Retrieval System: Experiments in Automatic Document Processing, Prentice Hall, Englewood Cliffs, NJ.Google Scholar
- Sheldon, D., Shokouhi, M., Szummer, M., and Craswell, N. 2011. Lambdamerge: merging the results of query reformulations. In Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM'11). 795--804. Google ScholarDigital Library
- Soskin, N., Kurland, O., and Domshlak, C. 2009. Navigating in the dark: modeling uncertainty in ad hoc retrieval using multiple relevance models. In Proceedings of the International Conference on the Theory of Information Retrieval (ICTIR'09). 79--91. Google ScholarDigital Library
- Svore, K. M., Kanani, P. H., and Khan, N. 2010. How good is a span of terms?: Exploiting proximity to improve web retrieval. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'10). 155--161. Google ScholarDigital Library
- Tan, B. and Peng, F. 2008. Unsupervised query segmentation using generative language models and Wikipedia. In Proceedings of the International Conference on World Wide Web (WWW'08). 347--356. Google ScholarDigital Library
- Wang, L., Lin, J., and Metzler, D. 2010. Learning to efficiently rank. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'10). 138--145. Google ScholarDigital Library
- Wang, X. and Zhai, C. 2008. Mining term association patterns from search logs for effective query reformulation. In Proceedings of the International Conference on Information and Knowledge Management (CIKM'08). 479--488. Google ScholarDigital Library
- Xu, J. and Croft, W. B. 2000. Improving the effectiveness of information retrieval with local context analysis. ACM Trans. Inf. Syst. 18, 1, 79--112. Google ScholarDigital Library
- Xu, J. and Li, H. 2007. Adarank: a boosting algorithm for information retrieval. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'07). 391--398. Google ScholarDigital Library
- Xu, Y., Jones, G. J., and Wang, B. 2009. Query dependent pseudo-relevance feedback based on Wikipedia. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'09). 59--66. Google ScholarDigital Library
- Xue, X. and Croft, W. B. 2010. Representing queries as distributions. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'10) Workshop on Query Representation and Understanding. 9--12.Google Scholar
- Xue, X. and Croft, W. B. 2011. Modeling subset distributions for verbose queries. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'11). 1133--1134. Google ScholarDigital Library
- Xue, X., Croft, W. B., and Smith, D. A. 2010. Modeling reformulation using passage analysis. In Proceedings of the International Conference on Information and Knowledge Management (CIKM'10). 1497--1500. Google ScholarDigital Library
- Zhai, C. and Lafferty, J. 2001a. Model-based feedback in the language modeling approach to information retrieval. In Proceedings of the International Conference on Information and Knowledge Management (CIKM'01). 403--410. Google ScholarDigital Library
- Zhai, C. and Lafferty, J. 2001b. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'01). 334--342. Google ScholarDigital Library
Index Terms
- Modeling reformulation using query distributions
Recommendations
Modeling reformulation using passage analysis
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge managementQuery reformulation modifies the original query with the aim of better matching the vocabulary of the relevant documents, and consequently improving ranking effectiveness. Previous techniques typically generate words and phrases related to the original ...
Query reformulation using anchor text
WSDM '10: Proceedings of the third ACM international conference on Web search and data miningQuery reformulation techniques based on query logs have been studied as a method of capturing user intent and improving retrieval effectiveness. The evaluation of these techniques has primarily, however, focused on proprietary query logs and selected ...
Comments