Abstract
A usual strategy to implement CLIR (Cross-Language Information Retrieval) systems is the so-called query translation approach. The user query is translated for each language present in the multilingual collection in order to compute an independent monolingual information retrieval process per language. Thus, this approach divides documents according to language. In this way, we obtain as many different collections as languages. After searching in these corpora and obtaining a result list per language, we must merge them in order to provide a single list of retrieved articles.
In this paper, we propose an approach to obtain a single list of relevant documents for CLIR systems driven by query translation. This approach, which we call 2-step RSV (RSV: Retrieval Status Value), is based on the re-indexing of the retrieval documents according to the query vocabulary, and it performs noticeably better than traditional methods.
The proposed method requires query vocabulary alignment: given a word for a given query, we must know the translation or translations to the other languages. Because this is not always possible, we have researched on a mixed model. This mixed model is applied in order to deal with queries with partial word-level alignment. The results prove that even in this scenario, 2-step RSV performs better than traditional merging methods.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Airio E, Keskustalo H, Hedlund1 T and Pirkola A (2003) UTACLIR @ CLEF 2002—Bilingual and Multilingual Runs with a Unified Process. In C Peters, M Braschler, J Gonzalo, and M Kluck, (Eds.), Advances in Cross-Language Information Retrieval, Third Workshop of the Cross-Language Evaluation Forum, CLEF 2002. Rome, Italy, September 19-20, 2002. Revised Papers, vol. 2785 of Lecture Notes in Computer Science, pp. 91–100. Springer Verlag.
Callan JP, Lu Z and Croft WB (1995) Searching distributed collections with inference networks. In Proceedings of the 18th International Conference of the ACM SIGIR'95, pp. 21–28, New York. The ACM Press.
Calvé A and Savoy J (2000) Database merging strategy based on logistic regression, Information Processing & Management, 36:341–359.
Chen A (2003) Cross-language retrieval experiments at CLEF-2002, In C Peters, M Braschler, J Gonzalo, and M Kluck, (Eds.), Advances in Cross-Language Information Retrieval, Third Workshop of the Cross-Language Evaluation Forum, CLEF 2002. Rome, Italy, September 19–20, 2002. Revised Papers, vol. 2785 of Lecture Notes in Computer Science, pp. 26–48. Springer Verlag.
Dumais S (1994) Latent Semantic Indexing (LSI) and TREC-2, In Proceedings of TREC'2, volume 500-215, pp. 105–115, Gaithersburg. NIST, D. K. Harman.
Gey F, Jiang H, Chen A and Larson R (2000) Manual Queries and Machine Translation in Cross-Language Retrieval and Interactive Retrieval with Cheshire II at TREC-7. In EM Voorhees and DK Harman (Eds.), Proceedings of the Seventh Text REtrieval Conference (TREC-7), vol. 500-242, pp. 527–540. NIST.
Grefenstette G, ed. (1998) Cross-Language Information Retrieval, Kluwer academic publishers, Boston, USA.
Harman DK (1992) Relevance feedback revisited. In NJ Belkin, P Ingwersen, and AM Pejtersen (Eds.), Proceedings of the 15th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-92), pp. 1–10. ACM.
Martín M, Martínez-Santiago F and Ureña L (2003) Aprendizaje neuronal aplicado a la fusión de colecciones multilingües en CLIR, Procesamiento del Lenguaje Natural, 1(31):227–234.
Martínez-Santiago F, Martín M and Ureña L (2003) SINAI at CLEF 2002: Experiments with merging strategies. In C Peters, M Braschler, J Gonzalo, and M Kluck (Eds.), Advances in Cross-Language Information Retrieval, Third Workshop of the Cross-Language Evaluation Forum, CLEF 2002. Rome, Italy, September 19-20, 2002. Revised Papers, vol. 2785 of Lecture Notes in Computer Science, pp. 103–110.
Martínez-Santiago F, Montejo-Ráez A, Ureña L and Diaz M (2004) SINAI at CLEF 2003: Merging and decompounding. Advances in Cross-Language Information Retrieval. Lecture Notes in Computer Science. Springer Verlag, pp. 192–200.
McNamee P and Mayfield J (2002) JHU/APL Experiments at CLEF: Translation resources and score normalization. In C Peters, M Braschler, J Gonzalo, and M Kluck, (Eds.), Evaluation of Cross-Language Information Retrieval Systems, Second Workshop of the Cross-Language Evaluation Forum, CLEF 2001, Darmstadt, Germany, September 3-4, 2001, Revised Papers, volume 2406 of Lecture Notes in Computer Science, pp. 193–208. Springer Verlag.
Moulinier I and Molina-Salgado H (2003) Thomson Legal and Regulatory experiments for CLEF 2002. In C Peters, M Braschler, J Gonzalo, and M Kluck (Eds.), Advances in Cross-Language Information Retrieval, Third Workshop of the Cross-Language Evaluation Forum, CLEF 2002. Rome, Italy, September 19-20, 2002. Revised Papers, volume 2785 of Lecture Notes in Computer Science, pp. 155–163. Springer Verlag.
Nie J and Jin F (2002) Merging different languages in a single document collection. In C Peters, M Braschler, J Gonzalo, and M Kluck (Eds.), Evaluation of Cross-Language Information Retrieval Systems, Second Workshop of the Cross-Language Evaluation Forum, CLEF 2001, Darmstadt, Germany, September 3-4, 2001, Revised Papers, volume 2406 of Lecture Notes in Computer Science, pp. 59–62. Springer Verlag.
Pirkola A (1998) The efects of query structure and dictionary setups in dictionarybased cross-language information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia.
Powell AL, French JC, Callan J, Connell M and Viles CL (2000) The impact of database selection on distributed searching. In Press TA (ed.), Proceedings of the 23rd International Conference of the ACM-SIGIR'2000, pp. 232–239, New York.
Robertson SE, Walker S and Beaulieu M (2000) Experimentation as a way of life: Okapi at TREC, Information Processing and Management, 1(36):95–108.
Savoy J (2002) Report on CLEF-2001 Experiments In C Peters, M Braschler, J Gonzalo and M Kluck (Eds.), Evaluation of Cross-Language Information Retrieval Systems, Second Workshop of the Cross-Language Evaluation Forum, CLEF 2001, Darmstadt, Germany, September 3–4, 2001, Revised Papers, vol. 2406 of Lecture Notes in Computer Science, pp. 27–43. Springer Verlag.
Savoy J (2003a) Cross-Language information retrieval: Experiments based on CLEF 2000 corpora, Information Processing & Management, 39:75–115.
Savoy J (2003b) Report on CLEF-2002 Experiments: Combining Multiple Sources of Evidence, In C Peters, M Braschler, J Gonzalo, and M Kluck (Eds.), Advances in Cross-Language Information Retrieval, Third Workshop of the Cross-Language Evaluation Forum, CLEF 2002. Rome, Italy, September 19–20, 2002. Revised Papers, vol. 2785 of Lecture Notes in Computer Science, pp. 31–46. Springer Verlag.
Savoy J (2004) Combining multiple strategies for effective cross-language retrieval, Information Retrieval, 7(1/2):121–148.
Sheridan P, Braschler P and Schäuble P (1997) Cross-Language information retrieval in a multilingual legal domain, In Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries, pp. 253–268.
Sperer R and Oard DW (2000) Structured translation for cross-language information retrieval. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 120–127. ACM Press.
Towell G, Voorhees EM, Gupta NK and Johnson-Laird B (1995) Learning collection fusion strategies for information retrieval. In Proceedings of the Twelfth Annual Machine Learning Conference, Lake Tahoe.
Voorhees E, Gupta NK and Johnson-Laird B (1995a) The collection fusion problem. In Harman, D. K., (Ed.), Proceedings of the 3th Text Retrieval Conference TREC-3, vol. 500–225, pp. 95–104, Gaithersburg. National Institute of Standards and Technology, Special Publication.
Voorhees E, Gupta NK and Johnson-Laird B (1995b) Learning collection fusion strategies. In ACM, editor, Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 172–179, Seattle.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Martínez-Santiago, F., Ureña-López, L.A. & Martín-Valdivia, M. A merging strategy proposal: The 2-step retrieval status value method. Inf Retrieval 9, 71–93 (2006). https://doi.org/10.1007/s10791-005-5722-4
Received:
Revised:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s10791-005-5722-4