skip to main content
article

Hindi CLIR in thirty days

Published:01 June 2003Publication History
Skip Abstract Section

Abstract

As participants in the TIDES Surprise language exercise, researchers at the University of Massachusetts helped collect Hindi--English resources and developed a cross-language information retrieval system. Components included normalization, stop-word removal, transliteration, structured query translation, and language modeling using a probabilistic dictionary derived from a parallel corpus. Existing technology was successfully applied to Hindi. The biggest stumbling blocks were collection of parallel English and Hindi text and dealing with numerous proprietary encodings.

References

  1. Abduljaleel, N. and Larkey, L. 2003. Statistical transliteration for English-Arabic cross language information retrieval. In CIKM 2003: Proceedings of the Twelfth International Conference on Information and Knowledge Management (New Orleans, LA, Nov. 2003). O. Frieder et al. eds. ACM, New York, 139--146. Google ScholarGoogle Scholar
  2. Aljlayl M. and Frieder, O. 2002. On Arabic search: Improving the retrieval effectiveness via a light stemmer approach. In CIKM 2002: Proceedings of the Eleventh International Conference on Information and Knowledge Management (McLean, VA, Nov. 2002). K Kalpakis. et al. eds. ACM, New York, 340--347. Google ScholarGoogle Scholar
  3. Allan, J., Lavrenko, V., and Connell, M. E. 2003. A month to topic detection and tracking in Hindi. ACM Trans. Asian Language Inform. Process., Vol. 2, No. 3, Sep. 2003. Google ScholarGoogle Scholar
  4. Ballestros, L. and Croft, W.B. 1998. Resolving ambiguity for cross-language retrieval. In Proceedings of the21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Melbourne, Aug. 1998), W.B. Croft et al. eds. ACM, New York, 64--71. Google ScholarGoogle Scholar
  5. Berger, A. and Lafferty, J. 1999. Information retrieval as statistical translation. In Proceedings of SIGIR '99: 22nd International Conference on Research and Development in Information Retrieval (Berkeley, CA, Aug. 1999), M. Hearst et al. eds. ACM, New York, 222--229. Google ScholarGoogle Scholar
  6. Callan, J.P., Crift, W.B. and Broglio, J. 1995. TREC and TIPSTER experiments with INQUERY. Inf. Process. Manage. 31 (1995), 327--343. Google ScholarGoogle Scholar
  7. Chen, A. and Gey, F.C. 2003. Generating statistical Hindi stemmers from parallel texts. ACM Trans. Asian Language Inform. Process., Vol. 2, No. 3, Sep. 2003.Google ScholarGoogle Scholar
  8. Davis, M.W. and Ogden, W.C. 1998. Free resources and advanced alignment for cross-language text retrieval. In Proceedings of the Sixth Text Retrieval Conference: TREC-6 (Gaithersburg, MD, Nov. 1997), E. M. Voorhees et al. eds. NIST Special Publication 500-240, 385--394.Google ScholarGoogle Scholar
  9. Larkey, L.S., Allan, J., Connell, M.E., Bolivar, A. and Wade, C. 2003. UMass at TREC 2002: Cross language and novelty tracks. In The Eleventh Text REtrieval Conference: TREC 2002 (Gaithersburg, MD, Nov. 2002), E.M. Voorhees et al. eds. NIST Special Publication 500-251, 721--732.Google ScholarGoogle Scholar
  10. Larkey, L.S. and Connell, M.E. 2003. Structured queries, Language modeling, and relevance modeling in cross-language information retrieval. Inf. Process. Manage. To appear Google ScholarGoogle Scholar
  11. Larkey, L.S., Ballestros, L., and Connell, M.E. 2002. Improving stemming for Arabic information retrieval: Light stemming and co-occurrence analysis. In SIGIR 2002: Proceedings of the Twenty-Fifth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Tampere, Finland, Aug. 2002), M. Beaulieu et al. eds. ACM, New York, 275--282. Google ScholarGoogle Scholar
  12. LDC. 1998. Linguistic Data Consortium North American News Text Supplement, LDC98T30. http://www.ldc.upenn.edu/Catalog/Google ScholarGoogle Scholar
  13. NTCIR Workshop 2. 2001. Proceedings of the Second NTCIR Workshop on Research in Chinese and Japanese Text Retrieval and Text Summarization (Tokyo, March 2001). http:/research.nii.ac.jp/ntcir/workshop/OnlineProceedings2.Google ScholarGoogle Scholar
  14. Oard, D.W. and Gey, F.C. 2003. The TREC-2002 Arabic/English CLIR track, In The Eleventh Text REtrieval Conference: TREC 2002 (Gaithersburg, MD, Nov. 2002), E.M.Voorhees et al. eds. NIST Special Publication 500-251, 17--26.Google ScholarGoogle Scholar
  15. Och, F.J. and Ney, H. 2000. Improved statistical alignment models. In Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (Hong Kong, Oct. 2000), 440--447. Google ScholarGoogle Scholar
  16. Peters, C., Braschler, M., Gonzalo, J., and Kluck, M. Eds. 2002. Evaluation of Cross-Language Information Retrieval Systems: Second Workshop of the Cross-Language Evaluation Forum, CLEF 2001: (Darmstadt, Germany, Sept. 2001). Revised papers. Lecture Notes in Computer Science, Vol. 2406, Springer, New York. Google ScholarGoogle Scholar
  17. Pirkola, A. 1998. The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Melbourne, Aug.1998), W.B. Croft et al. eds. ACM, New York, 55--63. Google ScholarGoogle Scholar
  18. Ramanathan, A. and Rao, D.D. 2003. A lightweight stemmer for Hindi. Presented at EACL 2003: 10th Conference of the European Chapter of the Association for Computational Linguistics, Workshop on Computational Linguistics for South Asian Languages (Budapest, April 2003.). http://computing.open.ac.uk/Sites/EACLSouthAsia/papers.htmGoogle ScholarGoogle Scholar
  19. Unicode, 2003. What is Unicode? http://www.unicode.org/standard/WhatIsUnicode.html.Google ScholarGoogle Scholar
  20. Xu, J., Weischedel, R. and Nguyen, C. 2001. Evaluating a probabilistic model for cross-lingual information retrieval. In SIGIR 2001: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (New Orleans, LA, Sept. 2001), W.B. Croft et al. eds. ACM, New York, 105--110. Google ScholarGoogle Scholar

Index Terms

  1. Hindi CLIR in thirty days

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader