Skip to main content
Log in

A novel dependency language model for information retrieval

  • Published:
Journal of Zhejiang University-SCIENCE A Aims and scope Submit manuscript

Abstract

This paper explores the application of term dependency in information retrieval (IR) and proposes a novel dependency retrieval model. This retrieval model suggests an extension to the existing language modeling (LM) approach to IR by introducing dependency models for both query and document. Relevance between document and query is then evaluated by reference to the Kullback-Leibler divergence between their dependency models. This paper introduces a novel hybrid dependency structure, which allows integration of various forms of dependency within a single framework. A pseudo relevance feedback based method is also introduced for constructing query dependency model. The basic idea is to use query-relevant top-ranking sentences extracted from the top documents at retrieval time as the augmented representation of query, from which the relationships between query terms are identified. A Markov Random Field (MRF) based approach is presented to ensure the relevance of the extracted sentences, which utilizes the association features between query terms within a sentence to evaluate the relevance of each sentence. This dependency retrieval model was compared with other traditional retrieval models. Experiments indicated that it produces significant improvements in retrieval effectiveness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Alvarez, C., Langlais, P., Nie, J., 2004. Word Pairs in Language Modeling for Information Retrieval. Proc. 7th International Conference on Computer Assisted Information Retrieval. Avignon, France, p.686–705.

  • Buckley, C., Salton, G., Allan, J., Singhal, A., 1995. Automatic Query Expansion Using SMART: TREC-3. Proc. 3rd Text Retrieval Conference. Maryland, USA, p.65–80.

  • Cao, G., Nie, J., Bai, J., 2005. Integrating Word Relationships into Language Models. Proc. 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Brazil, p.298–305.

  • Croft, W.B., Turtle, H.R., Lewis, D.D., 1991. The Use of Phrases and Structured Queries in Information Retrieval. Proc. 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Chicago, USA, p.32–45. [doi:10.1145/122860.122864]

  • Dillon, M., Gray, A.S., 1983. FASIT: a fully automatic syntactically based indexing system. J. Am. Soc. Inf. Sci., 34(2):99–108.

    Article  Google Scholar 

  • Dobrushin, P.L., 1968. The description of a random field by means of conditional probabilities and conditions of its regularity. Theory of Probability and Its Applications, 13(2):197–224. [doi:10.1137/1113026]

    Article  MATH  MathSciNet  Google Scholar 

  • Fagan, J.L, 1987. Automatic Phrase Indexing for Document Retrieval: An Examination of Syntactic and Non-syntactic Methods. Proc. 10th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Louisiana, USA, p.91–101. [doi:10.1145/42005.42016]

  • Gao, J., Nie, J., Wu, G., Cao, G., 2004. Dependence Language Model for Information Retrieval. Proc. 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Sheffield, UK, p.170–177.

  • Gauvain, J.L., Lee, C.H., 1994. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Trans. on Speech and Audio Processing, 2(2):291–298. [doi:10.1109/89.279278]

    Article  Google Scholar 

  • Hays, D.G., 1964. Dependency theory: a formalism and some observations. Language, 40(4):511–525. [doi:10.2307/411934]

    Article  Google Scholar 

  • Katz, S.M., 1987. Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Trans. on Acoustics, Speech and Signal Processing, 35(3):400–401. [doi:10.1109/TASSP.1987.1165125]

    Article  Google Scholar 

  • Lafferty, J., Zhai, C., 2001. Document Language Models, Query Models, and Risk Minimization for Information Retrieval. Proc. 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Louisiana, USA, p.111–119. [doi:10.1145/383952.383970]

  • Lee, C., Lee, G., Jang, M., 2006. Dependency structure language model for information retrieval. ETRI, 28(3):337–346.

    Article  MathSciNet  Google Scholar 

  • Lin, D., 1994. Principar—An Efficient, Broad-coverage, Principle-based Parser. Proc. 15th International Conference on Computational Linguistics. Kyoto, Japan, p.482–488.

  • Lo, A.W., 1988. Maximum likelihood estimation of generalized Ito processes with discretely sampled data. Econ. Theory, 4:231–247.

    Article  MathSciNet  Google Scholar 

  • Losee, R.M.Jr, 1994. Term dependence: truncating the Bahadur Lazarsfeld expansion. Inf. Process. Manage., 30(2): 293–303. [doi:10.1016/0306-4573(94)90071-X]

    Article  Google Scholar 

  • Metzler, D., Croft, W.B., 2005. A Markov Random Field Model for Term Dependencies. Proc. 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Brazil, p.472–479. [doi:10.1145/1076034.1076115]

  • Nallapati, R., Allan, J., 2002. Capturing Term Dependencies Using a Language Model Based on Sentence Trees. Proc. 11th ACM CIKM International Conference on Information and Knowledge Management. Virginia, USA, p.383–390.

  • Nallapati, R., Allan, J., 2003. An Adaptive Local Dependency Language Model: Relaxing the Naive Bayes’ Assumption. Proc. Workshop on Mathematical and Formal Models in Information Retrival, the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Toronto, Canada.

  • Ponte, J.M., Croft, W.B., 1998. A Language Modeling Approach to Information Retrieval. Proc. 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Melbourne, Australia, p.275–281. [doi:10.1145/290941.291008]

  • Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M., 1995. Okapi at TREC-3. Proc. 3rd Text Retrieval Conference. Maryland, USA, p.109–216.

  • Smeaton, A.F., van Rijsbergen, C.J., 1988. Experiments on Incorporating Syntactic Processing of User Queries into a Document Retrieval Strategy. Proc. 11th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Grenoble, France, p.31–51. [doi:10.1145/62437.62439]

  • Song, F., Croft, W.B., 1999. A General Language Model for Information Retrieval. Proc. 8th International Conference on Information and Knowledge Management. Missouri, USA, p.316–321.

  • Spark Jones, K., Walker, S., Robertson, S.E., 1998. A Probabilistic Model of Information Retrieval: Development and Status. Technical Report 446, University of Cambridge Computer Laboratory.

  • Srikanth, M., Srihari, R., 2002. Biterm Language Models for Document Retrieval. Proc. 25th Annual International ACM SIGIR Conference on Research and Development in Information. Tampere, Finland, p.425–426. [doi:10.1145/564376.564476]

  • Srikanth, M., Srihari, R., 2003. Exploiting Syntactic Structure of Queries in a Language Modeling Approach to IR. Proc. 12th International Conference on Information and Knowledge Management. LA, USA, p.476–483.

  • van Rijsbergen, C.J., 1977. A theoretical basis for the use of co-occurrence data in information retrieval. J. Document., 33(2):106–119.

    Article  Google Scholar 

  • van Rijsbergen, C.J., 1979. Information Retrieval. Butterworths, London.

    MATH  Google Scholar 

  • Zhai, C., Lafferty, J., 2001a. Model-based Feedback in the Language Modeling Approach to Information Retrieval. Proc. 10th ACM CIKM International Conference on Information and Knowledge Management. Atlanta, Georgia, USA, p.403–410.

  • Zhai, C., Lafferty, J., 2001b. A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval. Proc. 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Louisiana, USA, p.334–342. [doi:10.1145/383952.384019]

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bu Jia-jun.

Additional information

Project (No. 2006CB303000) supported in part by the National Basic Research Program (973) of China

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cai, Kk., Bu, Jj., Chen, C. et al. A novel dependency language model for information retrieval. J. Zhejiang Univ. - Sci. A 8, 871–882 (2007). https://doi.org/10.1631/jzus.2007.A0871

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/jzus.2007.A0871

Key words

CLC number

Navigation