Combining probability models and web mining models: a framework for proper name transliteration

Zhou, Yilu; Huang, Feng; Chen, Hsinchun

doi:10.1007/s10799-007-0031-9

Combining probability models and web mining models: a framework for proper name transliteration

Published: 11 December 2007

Volume 9, pages 91–103, (2008)
Cite this article

Information Technology and Management Aims and scope Submit manuscript

Yilu Zhou¹,
Feng Huang² &
Hsinchun Chen³

178 Accesses
4 Citations
Explore all metrics

Abstract

The rapid growth of the Internet has created a tremendous number of multilingual resources. However, language boundaries prevent information sharing and discovery across countries. Proper names play an important role in search queries and knowledge discovery. When foreign names are involved, proper names are often translated phonetically which is referred to as transliteration. In this research we propose a generic transliteration framework, which incorporates an enhanced Hidden Markov Model (HMM) and a Web mining model. We improved the traditional statistical-based transliteration in three areas: (1) incorporated a simple phonetic transliteration knowledge base; (2) incorporated a bigram and a trigram HMM; (3) incorporated a Web mining model that uses word frequency of occurrence information from the Web. We evaluated the framework on an English–Arabic back transliteration. Experiments showed that when using HMM alone, a combination of the bigram and trigram HMM approach performed the best for English–Arabic transliteration. While the bigram model alone achieved fairly good performance, the trigram model alone did not. The Web mining approach boosted the performance by 79.05%. Overall, our framework achieved a precision of 0.72 when the eight best transliterations were considered. Our results show promise for using transliteration techniques to improve multilingual Web retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Natural language processing: state of the art, current trends and challenges

Article 14 July 2022

A survey on large language model based autonomous agents

Article Open access 22 March 2024

Machine translation systems and quality assessment: a systematic review

Article Open access 10 April 2021

References

G.-W. Bian, H.-H. Chen, Cross-language information access to multilingual collections on the internet J. Am. Soc. Inf. Sci. 51, 281 (2000)
Article Google Scholar
P. Thompson, C.C. Dozier, Name Searching and Information Retrieval, in Proceedings of 2nd Conference on Empirical Methods in Natural Language Processing (Providence, Rhode Island, 1997)
H.-H. Chen, S.-J. Hueng, Y.-W. Ding et al., Proper name translation in cross-language information retrieval, in Proceedings of the 17th International Conference on Computational Linguistics (Montreal, 1998), p. 232
Y. Al-Onaizan, K. Knight, Translating named entities using monolingual and bilingual resources, in Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (2001), p. 400
Y. Al-Onaizan, K. Knight, Machine transliteration of names in Arabic text, in Proceedings of the ACL-02 Workshop on Computational Approaches to Semitic Languages (Philadelphia, 2002), pp. 1
N. AbdulJaleel, L.S. Larkey, Statistical transliteration for English–Arabic cross language information retrieval, in Proceedings of the Twelfth International Conference on Information and Knowledge Management (CIKM) (New Orleans, 2003), p. 139
W.-H. Lin, H.-H. Chen, Backward machine transliteration by learning phonetic similarity, in Proceedings of The 6th Workshop on Computational Language Learning (CoNLL-2002) (Taipei, 2002), p. 139
B.G. Stalls, K. Knight, Translating names and technical terms in Arabic text, in Proceedings of the COLING/ACL Workshop on Computational Approaches to Semitic Languages (Montreal, 1998)
M. Arbabi, S.M. Fischthal, V.C. Cheng, et al., Algorithms for Arabic Name Transliteration. IBM J. Res. Dev. 38, 183 (1994)
Article Google Scholar
K. Darwish, D. Doermann, R. Jones, et al., TREC-10 experiments at University of Maryland CLIR and video, in Text REtrieval Conference (Gaithersburg, 2001)
S. Wan, C.M. Verspoor, Automatic English–Chinese name transliteration for development of multilingual resources, in Proceedings of the 17th international conference on Computational linguistics (Montreal, 1998), p. 1352
P. Virga, S. Khudanpur, Transliteration of proper names in cross-lingual information retrieval, in Proceedings of the ACL Workshop on Multi-lingual Named Entity Recognition (Sapporo, 2003), p. 57
A. Kawtrakul, A. Deemagarn, C. Thumkanon, et al., Backward transliteration for Thai document retrieval, in Proceedings of the 1998 IEEE Asia-Pacific Conference on Circuits and Systems (APCCAD) (Chiangmai, 1998), p. 563
K. Knight, J. Graehl, Machine transliteration, in Proceedings of the Thirty-Fifth Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics (Somerset, 1997), p. 128
I. Goto, N. Uratani, T. Ehara, Cross-language information retrieval of proper nouns using context information, in Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium (Tokyo, 2001), p. 571
H. Meng, W.-K. Lo, B. Chen, et al., Generating phonetic cognates to handle named entities in English–Chinese cross-language spoken document retrieval, in Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding Workshop (ASRU) (Trento, 2001), p. 311
S.K. Pal, V. Talwar, P. Mitra, Web mining in soft computing framework: relevance, state of the art and future directions. IEEE Trans. Neural Networks 13, 1163 (2002)
Article Google Scholar
W.-H. Lu, L.-F. Chien, H.-J. Lee, Anchor text mining for translation of web queries: a transitive translation approach. ACM Trans. Inform. Syst. (TOIS) 22, 242 (2004)
Article Google Scholar
L.R. Rabiner, A tutorial on Hidden Markov Models and selected applications in speech recognition Proc. IEEE 77, 257–286 (1989)
Article Google Scholar
A.J. Viterbi, Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inform. Theory 13, 260 (1967)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Systems and Management, George Washington University, Funger Hall, 515N, 2201 G Street, NW, Washington, DC, 20052, USA
Yilu Zhou
Consumer Electronic Group, Handheld Division, Advanced Micro Devices, Inc., Sunnyvale, USA
Feng Huang
Department of Management Information Systems, The University of Arizona, Tucson, USA
Hsinchun Chen

Authors

Yilu Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Feng Huang
View author publications
You can also search for this author in PubMed Google Scholar
Hsinchun Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yilu Zhou.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, Y., Huang, F. & Chen, H. Combining probability models and web mining models: a framework for proper name transliteration. Inf Technol Manage 9, 91–103 (2008). https://doi.org/10.1007/s10799-007-0031-9

Download citation

Published: 11 December 2007
Issue Date: June 2008
DOI: https://doi.org/10.1007/s10799-007-0031-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Combining probability models and web mining models: a framework for proper name transliteration

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

A survey on large language model based autonomous agents

Machine translation systems and quality assessment: a systematic review

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Combining probability models and web mining models: a framework for proper name transliteration

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

A survey on large language model based autonomous agents

Machine translation systems and quality assessment: a systematic review

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation