RI for IR: Capturing Term Contexts Using Random Indexing for Comprehensive Information Retrieval

Prasath, Rajendra; Sarkar, Sudeshna; O’Reilly, Philip

doi:10.1007/978-3-319-13647-9_12

Rajendra Prasath^22,23,
Sudeshna Sarkar²² &
Philip O’Reilly²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8856))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

1749 Accesses
1 Citations

Abstract

In this paper, we present an approach, based on random indexing, to identify semantically related information that effectively disambiguate the user query and improves the retrieval efficiency of news documents. User query terms are expanded based on the terms with similar word senses that are discovered by implicitly considering the “associatedness” of the document context with that of the given query. This type of associatedness is guided by word space models, as described by Kanerva et al.(2000). The word-space model computes the meaning of the terms by implicitly utilizing the distributional patterns (contexts) of words collected over large text data. The distributional patterns represent semantic similarity between words in terms of their spatial proximity in the context space. In this space, words are represented by context vectors whose relative directions are assumed to indicate semantic similarity. Motivated by this distributional hypothesis, words with similar meanings are assumed to have similar contexts. For example, if we observe two words that constantly occur with the same context, we are justified in assuming that they mean similar things. Hence the word space methodology makes semantics computable and the underlying models do not require any linguistic or semantic expertise. Experimental results done on FIRE news collection show that the proposed approach effectively captures the term contexts using higher order term associations across the collection of news documents and use such information to assist the retrieval of documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Johnson, W., Lindenstrauss, L.: Extensions of lipschitz maps into a hilbert space. Contemporary Mathematics 26, 189–206 (1984)
Article MATH MathSciNet Google Scholar
Sahlgren, M.: An introduction to random indexing. In: Methods and Applications of Semantic Indexing Workshop at 7th Int. Conf. on Terminology and Knowledge Eng., TKE 2005 (2005)
Google Scholar
Sahlgren, M.: The Word-Space Model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces. PhD thesis, Stockholm University (2006)
Google Scholar
Kanerva, P., Kristoferson, J., Holst, A.: Random indexing of text samples for latent semantic analysis. In: Proceedings of the 22nd Annual Conference of the Cognitive Science Society, pp. 103–106. Erlbaum (2000)
Google Scholar
Henriksson, A., Moen, H., Skeppstedt, M., Daudaravicius, V., Duneld, M.: Synonym extraction and abbreviation expansion with ensembles of semantic spaces. J. Biomedical Semantics 5, 6 (2014)
Article Google Scholar
Sahlgren, M., Karlgren, J.: Automatic bilingual lexicon acquisition using random indexing of parallel corpora. Natural Language Engineering 11(3), 327–341 (2005)
Article Google Scholar
Vasuki, V., Cohen, T.: Reflective random indexing for semi-automatic indexing of the biomedical literature. J. of Biomedical Informatics 43(5), 694–700 (2010)
Article Google Scholar
Vasuki, V., Cohen, T.: Reflective random indexing for semi-automatic indexing of the biomedical literature. Journal of Biomedical Informatics 43(5), 694–700 (2010)
Article Google Scholar
Sahlgren, M., Karlgren, J.: Vector-based semantic analysis using random indexing for cross-lingual query expansion. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) CLEF 2001. LNCS, vol. 2406, pp. 169–176. Springer, Heidelberg (2002)
Chapter Google Scholar
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by Latent Semantic Analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)
Article Google Scholar
Sahlgren, M., Cöster, R.: Using bag-of-concepts to improve the performance of support vector machines in text categorization. In: Proceedings of the 20th International Conference on Computational Linguistics. COLING 2004. Association for Computational Linguistics, Stroudsburg (2004)
Google Scholar
Sahlgren, M., Karlgren, J.: Automatic bilingual lexicon acquisition using random indexing of parallel corpora. Nat. Lang. Eng. 11, 327–341 (2005)
Article Google Scholar
Sahlgren, M., Karlgren, J., Cöster, R., Järvinen, T.: Sics at clef 2002: Automatic query expansion using random indexing. In: Peters, C., Braschler, M., Gonzalo, J. (eds.) CLEF 2002. LNCS, vol. 2785, pp. 311–320. Springer, Heidelberg (2003)
Chapter Google Scholar
Singhal, A., Salton, G., Mitra, M., Buckley, C.: Document length normalization. Inf. Process. Manage. 32(5), 619–633 (1996)
Article Google Scholar
Majumder. P., M.M., Dataa, K.: Multilingual information access: an indian language perspective. In: Proc. ACM SIGIR Workshop on New Directions in Multilingual Information Access, Seattle, pp. 22–27 (2006)
Google Scholar
Majumder, P., Mitra, M., Pal, D., Bandyopadhyay, A., Maiti, S., Pal, S., Modak, D., Sanyal, S.: The fire 2008 evaluation exercise. ACM Transactions on Asian Language Information Processing (TALIP) 9, 10:1–10:24 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, 721 302, India
Rajendra Prasath & Sudeshna Sarkar
Department of Business Information Systems, University College Cork, Cork, Ireland
Rajendra Prasath & Philip O’Reilly

Authors

Rajendra Prasath
View author publications
You can also search for this author in PubMed Google Scholar
Sudeshna Sarkar
View author publications
You can also search for this author in PubMed Google Scholar
Philip O’Reilly
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Centro de Investigación en Computación, Instituto Politécnico Nacional, Av. Juan Dios Bátiz s/n, Col. Nueva Industrial Vallejo, 07738, Mexico City, Mexico
Alexander Gelbukh
Área Académica de Computación y Electrónica, Carretera Pachuca-Tulancingo, Universidad Autónoma del Estado de Hidalgo, Km. 4.5, Col. Carboneras, Mineral de la Reforma, 42180, Hidalgo, Mexico
Félix Castro Espinoza
Facultad de ciencias, Universidad Autónoma Nacional de México, Ciudad Universitaria, México DF, Mexico
Sofía N. Galicia-Haro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Prasath, R., Sarkar, S., O’Reilly, P. (2014). RI for IR: Capturing Term Contexts Using Random Indexing for Comprehensive Information Retrieval. In: Gelbukh, A., Espinoza, F.C., Galicia-Haro, S.N. (eds) Human-Inspired Computing and Its Applications. MICAI 2014. Lecture Notes in Computer Science(), vol 8856. Springer, Cham. https://doi.org/10.1007/978-3-319-13647-9_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-13647-9_12
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13646-2
Online ISBN: 978-3-319-13647-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics