Skip to main content
Log in

Context-Based Query Using Dependency Structures Based on Latent Topic Model

  • Original Article
  • Published:
Journal on Data Semantics

Abstract

To improve and enhance information retrieval on text database, there have been many approaches proposed so far, but few investigation captures context aspects of queries (of languages) directly. Here, we propose a new approach to retrieve contextual dependencies in Japanese based on latent topic model. The key idea comes from dependency structure which captures context in the database and the queries. We examine some experimental results to see the effectiveness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. In other words, a topic does not mean human-recognizable subject such as politics or airplane but a kind of cluster putting together by some probabilistic measure.

  2. Early draft version of this work appeared as “Context-based Query using Dependency Structures based on Latent Topic Model” in 2nd International Conference on Model and Data Engineering (MEDI2012), Poitiers, France. We have extended the comparison with several relevant investigation, revised the discussion section and some other minor changes.

  3. word is a syntax.

  4. One exception is any predicate should appear as a last verb.

  5. We mean we may generate dependencies based on this probability distribution.

  6. Here, we assume the joint probability in a form of naive Bayesian manner.

  7. Clinton, ZeroZero, Ashita appear where the latter two words show the names of Manga.

References

  1. Berger A, Lafferty J (1999) Information retrieval as statistical translation. Proceedings of the ACM SIGIR, pp 222–229

  2. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  3. Blei DM, Lafferty JD (2006) Dynamic topic models. Proceedings of the ICML, pp 113–120

  4. Canini KR, Shi L, Griffiths TL (2009) Online inference of topics with latent dirichlet allocation. J Mach Learn Res Proc Track 5:65–72

    Google Scholar 

  5. Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman, R: Indexing by latent semantic analysis. J Am Soc Inform Sci 41(6):391–407 (1990)

    Google Scholar 

  6. Grossman DA, Frieder O 2004 Information retrieval algorithms and heuristics. 2nd edn. Springer, Heidelberg

  7. Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international SIGIR conference on research and development in information retrieval, proceedings ACM SIGIR

  8. Hoffman MD, Blei DM, Bach FR (2010) Online learning for latent dirichlet allocation. In: Proceedings of the 24th annual conference on neural information processing systems (NIPS), pp 856–864

  9. Iwata T, Yamada T, Sakurai Y, Ueda N (2010) Online multiscale dynamic topic models. In: Proceedings of the ACM SIGKDD, pp 663–672

  10. Kurohashi S, Nagao M (1994) KN Parser : Japanese dependency/case structure analyzer. In: Proceedings of the workshop on sharable natural language, resources

  11. Li W, McCallum A (2006) Pachinko allocation: DAG-structured mixture models of topic correlations. In: Proceedings of the ICML, pp 577–584

  12. Liu X, Croft WB (2004) Cluster-based retrieval using language models. In: Proceedings of the ACM SIGIR, pp 186–193

  13. Manning CD, Schutze H (1999) Foundations of statistical natural language processing. MIT Press, Cambridge

  14. Sato I, Kurihara K, Nakagawa H (2010) Deterministic single-pass algorithm for LDA. In: Proceedings 24th annual conference on neural information processing systems (NIPS), pp 2074–2082

  15. Shinzato K, Kurohashi S (2010) Exploiting term importance categories and dependency relations for natural language search. In: Proceedings of the 2nd workshop on NLPIX 2010, Beijing, pp 2–11

  16. Shirai M, Miura T (2011) On domain independence of author identification. In: Proceedings of the 12th international conference on intelligent data engineering and automated learning (IDEAL), Norwich

  17. Wakabayashi K, Miura T (2012) Forward–backward activation algorithm for Hierarchical Hidden Markov models. In: Proceedings of the 26th annual conference on neural information processing systems (NIPS), Nevada

  18. Wei X, Bruce Croft W (2006) LDA-based document models for Ad-Hoc retrieval. In: Proceedings of the ACM SIGIR

  19. Yanagisawa T, Miura T (2009) Sentence generation for stream announcement. In: Proceedings of the IEEE international pacific rim conference on communications, computers and signal processing (PACRIM)

  20. Yanagisawa T, Miura T, Shioya I (2010) Simplifying sentences by frequent parsing patterns. In: Proceedings of the 11th international conference on intelligent data engineering and automated learning (IDEAL)

  21. Yi X, Allan J (2009) A comparative study of utilizing topic models for information retrieval. In: Proceedings of the 31th European conference on IR research on advances in, information retrieval, pp 29–41

  22. Zhai C, Lafferty J (2001) A study of smoothing methods for language models applied to ad-hoc information retrieval. In: Proceedings of the ACM SIGIR, pp 334–342

Download references

Acknowledgments

The authors deeply thank the reviewers of the journal and the MEDI2012 conference for their helpful comments. The authors kept feeling as if all of us had completed the joint work of this investigation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Masato Shirai.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shirai, M., Yanagisawa, T. & Miura, T. Context-Based Query Using Dependency Structures Based on Latent Topic Model. J Data Semant 3, 157–168 (2014). https://doi.org/10.1007/s13740-013-0031-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13740-013-0031-3

Keywords

Navigation