An Ensemble of Grapheme and Phoneme for Machine Transliteration

Oh, Jong-Hoon; Choi, Key-Sun

doi:10.1007/11562214_40

Jong-Hoon Oh²² &
Key-Sun Choi²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3651))

Included in the following conference series:

International Conference on Natural Language Processing

1566 Accesses
13 Citations

Abstract

Machine transliteration is an automatic method to generate characters or words in one alphabetical system for the corresponding characters in another alphabetical system. There has been increasing concern on machine transliteration as an assistant of machine translation and information retrieval. Three machine transliteration models, including “grapheme-based model”, “phoneme-based model”, and “hybrid model”, have been proposed. However, there are few works trying to make use of correspondence between source grapheme and phoneme, although the correspondence plays an important role in machine transliteration. Furthermore there are few works, which dynamically handle source grapheme and phoneme. In this paper, we propose a new transliteration model based on an ensemble of grapheme and phoneme. Our model makes use of the correspondence and dynamically uses source grapheme and phoneme. Our method shows better performance than the previous works about 15~23% in English-to-Korean transliteration and about 15~43% in English-to-Japanese transliteration.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aha, D.W.: Lazy learning: Special issue editorial. Artificial Intelligence Review 11, 710 (1997)
Article Google Scholar
Al-Onaizan, Y., Knight, K.: Translating Named Entities Using Monolingual and Bilingual Resources. In: The Proceedings of ACL 2002 (2002)
Google Scholar
Berger, A., Della Pietra, S., Della Pietra, V.: A maximum entropy approach to natural language processing. Computational Linguistics 22(1), 39–71 (1996)
Google Scholar
Slaven, B., Tanaka, H.: Improving Back-Transliteration by Combining Information Sources. In: Proc. of IJC-NLP 2004, pp. 542–547 (2004)
Google Scholar
Daelemans, W., Zavrel, J., van der Sloot, K., van den Bosch, A.: 2002, Timble TiMBL: Tilburg Memory Based Learner, version 4.3, Reference Guide, ILK Technical Report 02-10, (2002).
Google Scholar
Fujii, A., Tetsuya, I.: Japanese/English Cross-Language Information Retrieval: Exploration of Query Translation and Transliteration. Computers and the Humanities 35(4), 389–420 (2001)
Article Google Scholar
Goto, I., Kato, N., Uratani, N., Ehara, T.: Transliteration Considering Context Information Based on the Maximum Entropy Method. In: Proceedings of MT-Summit IX (2003)
Google Scholar
Kang, B.J., Choi, K.-S.: Automatic Transliteration and Back-transliteration by Decision Tree Learning. In: Proceedings of the 2nd International Conference on Language Resources and Evaluation (2000)
Google Scholar
Kang, I.H., Kim, G.C.: English-to-Korean Transliteration using Multiple Unbounded Overlapping Phoneme Chunk. In: Proceedings of the 18th International Conference on Computational Linguistics (2000)
Google Scholar
Knight, K., Graehl, J.: Machine Transliteration. In: Proceedings. of the 35th Annual Meetings of the Association for Computational Linguistics, ACL (1997)
Google Scholar
Lee, J.S., Choi, K.S.: English to Korean Statistical transliteration for information retrieval. Computer Processing of Oriental Languages 12(1), 17–37 (1998)
MathSciNet Google Scholar
Lee, J.S.: An English-Korean transliteration and Retransliteration model for Cross-lingual information retrieval, PhD Thesis, Computer Science Dept., KAIST (1999)
Google Scholar
Haizhou, L., Zhang, M., Su, J.: A Joint Source-Channel Model for Machine Transliteration. In: ACL 2004, pp. 159–166 (2004)
Google Scholar
Nam, Y.S.: Foreign dictionary. Sung-An-Dang publisher (1997)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kauffman, San Francisco (1993)
Google Scholar
Zhang, L.: Maximum Entropy Modeling Toolkit for Python and C++ (2004), http://www.nlplab.cn/zhangle/

Download references

Author information

Authors and Affiliations

Department of Computer Science, KAIST/KORTERM/BOLA, 373-1 Guseong-dong, Yuseong-gu, Daejeon, 305-701, Republic of Korea
Jong-Hoon Oh & Key-Sun Choi

Authors

Jong-Hoon Oh
View author publications
You can also search for this author in PubMed Google Scholar
Key-Sun Choi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Language Technology, Macquarie University, 2019, Sydney, NSW, Australia
Robert Dale
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Kam-Fai Wong
Institute for Infocomm Research, 21, Heng Mui Keng Terrace, 119613, Singapore
Jian Su
Language Information Sciences Research Centre, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong
Oi Yee Kwong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Oh, JH., Choi, KS. (2005). An Ensemble of Grapheme and Phoneme for Machine Transliteration. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_40

Download citation

DOI: https://doi.org/10.1007/11562214_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29172-5
Online ISBN: 978-3-540-31724-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics