Skip to main content
Log in

Combining probability models and web mining models: a framework for proper name transliteration

  • Published:
Information Technology and Management Aims and scope Submit manuscript

Abstract

The rapid growth of the Internet has created a tremendous number of multilingual resources. However, language boundaries prevent information sharing and discovery across countries. Proper names play an important role in search queries and knowledge discovery. When foreign names are involved, proper names are often translated phonetically which is referred to as transliteration. In this research we propose a generic transliteration framework, which incorporates an enhanced Hidden Markov Model (HMM) and a Web mining model. We improved the traditional statistical-based transliteration in three areas: (1) incorporated a simple phonetic transliteration knowledge base; (2) incorporated a bigram and a trigram HMM; (3) incorporated a Web mining model that uses word frequency of occurrence information from the Web. We evaluated the framework on an English–Arabic back transliteration. Experiments showed that when using HMM alone, a combination of the bigram and trigram HMM approach performed the best for English–Arabic transliteration. While the bigram model alone achieved fairly good performance, the trigram model alone did not. The Web mining approach boosted the performance by 79.05%. Overall, our framework achieved a precision of 0.72 when the eight best transliterations were considered. Our results show promise for using transliteration techniques to improve multilingual Web retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. G.-W. Bian, H.-H. Chen, Cross-language information access to multilingual collections on the internet J. Am. Soc. Inf. Sci. 51, 281 (2000)

    Article  Google Scholar 

  2. P. Thompson, C.C. Dozier, Name Searching and Information Retrieval, in Proceedings of 2nd Conference on Empirical Methods in Natural Language Processing (Providence, Rhode Island, 1997)

  3. H.-H. Chen, S.-J. Hueng, Y.-W. Ding et al., Proper name translation in cross-language information retrieval, in Proceedings of the 17th International Conference on Computational Linguistics (Montreal, 1998), p. 232

  4. Y. Al-Onaizan, K. Knight, Translating named entities using monolingual and bilingual resources, in Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (2001), p. 400

  5. Y. Al-Onaizan, K. Knight, Machine transliteration of names in Arabic text, in Proceedings of the ACL-02 Workshop on Computational Approaches to Semitic Languages (Philadelphia, 2002), pp. 1

  6. N. AbdulJaleel, L.S. Larkey, Statistical transliteration for English–Arabic cross language information retrieval, in Proceedings of the Twelfth International Conference on Information and Knowledge Management (CIKM) (New Orleans, 2003), p. 139

  7. W.-H. Lin, H.-H. Chen, Backward machine transliteration by learning phonetic similarity, in Proceedings of The 6th Workshop on Computational Language Learning (CoNLL-2002) (Taipei, 2002), p. 139

  8. B.G. Stalls, K. Knight, Translating names and technical terms in Arabic text, in Proceedings of the COLING/ACL Workshop on Computational Approaches to Semitic Languages (Montreal, 1998)

  9. M. Arbabi, S.M. Fischthal, V.C. Cheng, et al., Algorithms for Arabic Name Transliteration. IBM J. Res. Dev. 38, 183 (1994)

    Article  Google Scholar 

  10. K. Darwish, D. Doermann, R. Jones, et al., TREC-10 experiments at University of Maryland CLIR and video, in Text REtrieval Conference (Gaithersburg, 2001)

  11. S. Wan, C.M. Verspoor, Automatic English–Chinese name transliteration for development of multilingual resources, in Proceedings of the 17th international conference on Computational linguistics (Montreal, 1998), p. 1352

  12. P. Virga, S. Khudanpur, Transliteration of proper names in cross-lingual information retrieval, in Proceedings of the ACL Workshop on Multi-lingual Named Entity Recognition (Sapporo, 2003), p. 57

  13. A. Kawtrakul, A. Deemagarn, C. Thumkanon, et al., Backward transliteration for Thai document retrieval, in Proceedings of the 1998 IEEE Asia-Pacific Conference on Circuits and Systems (APCCAD) (Chiangmai, 1998), p. 563

  14. K. Knight, J. Graehl, Machine transliteration, in Proceedings of the Thirty-Fifth Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics (Somerset, 1997), p. 128

  15. I. Goto, N. Uratani, T. Ehara, Cross-language information retrieval of proper nouns using context information, in Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium (Tokyo, 2001), p. 571

  16. H. Meng, W.-K. Lo, B. Chen, et al., Generating phonetic cognates to handle named entities in English–Chinese cross-language spoken document retrieval, in Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding Workshop (ASRU) (Trento, 2001), p. 311

  17. S.K. Pal, V. Talwar, P. Mitra, Web mining in soft computing framework: relevance, state of the art and future directions. IEEE Trans. Neural Networks 13, 1163 (2002)

    Article  Google Scholar 

  18. W.-H. Lu, L.-F. Chien, H.-J. Lee, Anchor text mining for translation of web queries: a transitive translation approach. ACM Trans. Inform. Syst. (TOIS) 22, 242 (2004)

    Article  Google Scholar 

  19. L.R. Rabiner, A tutorial on Hidden Markov Models and selected applications in speech recognition Proc. IEEE 77, 257–286 (1989)

    Article  Google Scholar 

  20. A.J. Viterbi, Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inform. Theory 13, 260 (1967)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yilu Zhou.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, Y., Huang, F. & Chen, H. Combining probability models and web mining models: a framework for proper name transliteration. Inf Technol Manage 9, 91–103 (2008). https://doi.org/10.1007/s10799-007-0031-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10799-007-0031-9

Keywords

Navigation