Domain-Specific IR for German, English and Russian Languages

Fautsch, Claire; Dolamic, Ljiljana; Abdou, Samir; Savoy, Jacques

doi:10.1007/978-3-540-85760-0_26

Claire Fautsch¹,
Ljiljana Dolamic¹,
Samir Abdou¹ &
…
Jacques Savoy¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5152))

Included in the following conference series:

Workshop of the Cross-Language Evaluation Forum for European Languages

594 Accesses
2 Citations

Abstract

In participating in this domain-specific track, our first objective is to propose and evaluate a light stemmer for the Russian language. Our second objective is to measure the relative merit of various search engines used for the German and to a lesser extent the English languages. To do so we evaluated the tf ·idf, Okapi, IR models derived from the Divergence from Randomness (DFR) paradigm, and also a language model (LM). For the Russian language, we find that word-based indexing using our light stemming procedure results in better retrieval effectiveness than does the 4-gram indexing strategy (relative difference around 30%). Using the German corpus, we examine certain variations in retrieval effectiveness after applying the specialized thesaurus to automatically enlarge topic descriptions. In this case, the performance variations were relatively small and usually non significant.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Petras, V., Baerisch, S., Stempfhuber, M.: The Domain-Specific Track at CLEF 2007. In: Peters, C., et al. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 160–173. Springer, Heidelberg (2008)
Google Scholar
Amati, G., van Rijsbergen, C.J.: Probabilistic Models of Information Retrieval Based on Measuring the Divergence from Randomness. ACM Transactions on Information Systems 20, 357–389 (2002)
Article Google Scholar
Hiemstra, D.: Using Language Models for Information Retrieval. PhD Thesis (2000)
Google Scholar
Dolamic, L., Savoy, J.: Stemming Approaches for East European Languages. In: Peters, C., et al. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 37–44. Springer, Heidelberg (2008)
Google Scholar
McNamee, P., Mayfield, J.: Character N-gram Tokenization for European Language Text Retrieval. IR Journal 7, 73–97 (2004)
Google Scholar
Buckley, C., Singhal, A., Mitra, M., Salton, G.: New Retrieval Approaches Using SMART. In: Proceedings TREC-4, Gaithersburg, pp. 25–48 (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, University of Neuchatel, Rue Emile Argand 11, 2009, Neuchatel, Switzerland
Claire Fautsch, Ljiljana Dolamic, Samir Abdou & Jacques Savoy

Authors

Claire Fautsch
View author publications
You can also search for this author in PubMed Google Scholar
Ljiljana Dolamic
View author publications
You can also search for this author in PubMed Google Scholar
Samir Abdou
View author publications
You can also search for this author in PubMed Google Scholar
Jacques Savoy
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Carol Peters Valentin Jijkoun Thomas Mandl Henning Müller Douglas W. Oard Anselmo Peñas Vivien Petras Diana Santos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fautsch, C., Dolamic, L., Abdou, S., Savoy, J. (2008). Domain-Specific IR for German, English and Russian Languages. In: Peters, C., et al. Advances in Multilingual and Multimodal Information Retrieval. CLEF 2007. Lecture Notes in Computer Science, vol 5152. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85760-0_26

Download citation

DOI: https://doi.org/10.1007/978-3-540-85760-0_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85759-4
Online ISBN: 978-3-540-85760-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics