Skip to main content

RI for IR: Capturing Term Contexts Using Random Indexing for Comprehensive Information Retrieval

  • Conference paper
Human-Inspired Computing and Its Applications (MICAI 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8856))

Included in the following conference series:

Abstract

In this paper, we present an approach, based on random indexing, to identify semantically related information that effectively disambiguate the user query and improves the retrieval efficiency of news documents. User query terms are expanded based on the terms with similar word senses that are discovered by implicitly considering the “associatedness” of the document context with that of the given query. This type of associatedness is guided by word space models, as described by Kanerva et al.(2000). The word-space model computes the meaning of the terms by implicitly utilizing the distributional patterns (contexts) of words collected over large text data. The distributional patterns represent semantic similarity between words in terms of their spatial proximity in the context space. In this space, words are represented by context vectors whose relative directions are assumed to indicate semantic similarity. Motivated by this distributional hypothesis, words with similar meanings are assumed to have similar contexts. For example, if we observe two words that constantly occur with the same context, we are justified in assuming that they mean similar things. Hence the word space methodology makes semantics computable and the underlying models do not require any linguistic or semantic expertise. Experimental results done on FIRE news collection show that the proposed approach effectively captures the term contexts using higher order term associations across the collection of news documents and use such information to assist the retrieval of documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Johnson, W., Lindenstrauss, L.: Extensions of lipschitz maps into a hilbert space. Contemporary Mathematics 26, 189–206 (1984)

    Article  MATH  MathSciNet  Google Scholar 

  2. Sahlgren, M.: An introduction to random indexing. In: Methods and Applications of Semantic Indexing Workshop at 7th Int. Conf. on Terminology and Knowledge Eng., TKE 2005 (2005)

    Google Scholar 

  3. Sahlgren, M.: The Word-Space Model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces. PhD thesis, Stockholm University (2006)

    Google Scholar 

  4. Kanerva, P., Kristoferson, J., Holst, A.: Random indexing of text samples for latent semantic analysis. In: Proceedings of the 22nd Annual Conference of the Cognitive Science Society, pp. 103–106. Erlbaum (2000)

    Google Scholar 

  5. Henriksson, A., Moen, H., Skeppstedt, M., Daudaravicius, V., Duneld, M.: Synonym extraction and abbreviation expansion with ensembles of semantic spaces. J. Biomedical Semantics 5, 6 (2014)

    Article  Google Scholar 

  6. Sahlgren, M., Karlgren, J.: Automatic bilingual lexicon acquisition using random indexing of parallel corpora. Natural Language Engineering 11(3), 327–341 (2005)

    Article  Google Scholar 

  7. Vasuki, V., Cohen, T.: Reflective random indexing for semi-automatic indexing of the biomedical literature. J. of Biomedical Informatics 43(5), 694–700 (2010)

    Article  Google Scholar 

  8. Vasuki, V., Cohen, T.: Reflective random indexing for semi-automatic indexing of the biomedical literature. Journal of Biomedical Informatics 43(5), 694–700 (2010)

    Article  Google Scholar 

  9. Sahlgren, M., Karlgren, J.: Vector-based semantic analysis using random indexing for cross-lingual query expansion. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) CLEF 2001. LNCS, vol. 2406, pp. 169–176. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  10. Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by Latent Semantic Analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)

    Article  Google Scholar 

  11. Sahlgren, M., Cöster, R.: Using bag-of-concepts to improve the performance of support vector machines in text categorization. In: Proceedings of the 20th International Conference on Computational Linguistics. COLING 2004. Association for Computational Linguistics, Stroudsburg (2004)

    Google Scholar 

  12. Sahlgren, M., Karlgren, J.: Automatic bilingual lexicon acquisition using random indexing of parallel corpora. Nat. Lang. Eng. 11, 327–341 (2005)

    Article  Google Scholar 

  13. Sahlgren, M., Karlgren, J., Cöster, R., Järvinen, T.: Sics at clef 2002: Automatic query expansion using random indexing. In: Peters, C., Braschler, M., Gonzalo, J. (eds.) CLEF 2002. LNCS, vol. 2785, pp. 311–320. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  14. Singhal, A., Salton, G., Mitra, M., Buckley, C.: Document length normalization. Inf. Process. Manage. 32(5), 619–633 (1996)

    Article  Google Scholar 

  15. Majumder. P., M.M., Dataa, K.: Multilingual information access: an indian language perspective. In: Proc. ACM SIGIR Workshop on New Directions in Multilingual Information Access, Seattle, pp. 22–27 (2006)

    Google Scholar 

  16. Majumder, P., Mitra, M., Pal, D., Bandyopadhyay, A., Maiti, S., Pal, S., Modak, D., Sanyal, S.: The fire 2008 evaluation exercise. ACM Transactions on Asian Language Information Processing (TALIP) 9, 10:1–10:24 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Prasath, R., Sarkar, S., O’Reilly, P. (2014). RI for IR: Capturing Term Contexts Using Random Indexing for Comprehensive Information Retrieval. In: Gelbukh, A., Espinoza, F.C., Galicia-Haro, S.N. (eds) Human-Inspired Computing and Its Applications. MICAI 2014. Lecture Notes in Computer Science(), vol 8856. Springer, Cham. https://doi.org/10.1007/978-3-319-13647-9_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-13647-9_12

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-13646-2

  • Online ISBN: 978-3-319-13647-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics