Abstract
Statistical translation models have been shown to outperform simple document language models which rely on exact matching of words in the query and documents. A main challenge in applying translation models to ad hoc information retrieval is to estimate a translation model without training data. In this paper, we perform axiomatic analysis of translation language model for retrieval in order to gain insights about how to optimize the estimation of translation probabilities. We propose a set of constraints that a reasonable translation language model should satisfy. We check these constraints on the state-of-the-art translation estimation method based on Mutual Information and find that it does not satisfy most of the constraints. We then propose a new estimation method that better satisfies the defined constraints. Experimental results on representative TREC data sets show that the proposed new estimation method outperforms the existing Mutual Information-based estimation, suggesting that the proposed constraints are indeed helpful for designing better estimation methods for translation language model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Berger, A., Lafferty, J.: Information retrieval as statistical translation. In: ACM SIGIR, pp. 222–229 (1999)
Fang, H., Tao, T., Zhai, C.: A formal study of information retrieval heuristics. In: SIGIR, pp. 49–56 (2004)
Fang, H., Tao, T., Zhai, C.: Diagnostic evaluation of information retrieval models. TOIS 29 (2011)
Jin, R., Hauptmann, A.G., Zhai, C.X.: Title language model for information retrieval. In: ACM SIGIR, pp. 42–48 (2002)
Karimzadehgan, M., Zhai, C.: Estimation of statistical translation models based on mutual information for ad hoc information retrieval. In: SIGIR, pp. 323–330 (2010)
Ponte, J., Croft, W.B.: A language modeling approach to information retrieval. In: ACM SIGIR, pp. 275–281 (1998)
Porter, M.: An algorithm for suffix stripping. Program 14(3) (1980)
Rijsbergen, C.J.V.: Information retrieval. Butterworths (1979)
Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics 1, 80–83 (1945)
Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: ACM SIGIR, pp. 334–342 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Karimzadehgan, M., Zhai, C. (2012). Axiomatic Analysis of Translation Language Model for Information Retrieval. In: Baeza-Yates, R., et al. Advances in Information Retrieval. ECIR 2012. Lecture Notes in Computer Science, vol 7224. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28997-2_23
Download citation
DOI: https://doi.org/10.1007/978-3-642-28997-2_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28996-5
Online ISBN: 978-3-642-28997-2
eBook Packages: Computer ScienceComputer Science (R0)