透過階層式翻譯分類擴充雙語WordNet

本論文描述一自動分類方法，為雙語資源（例如雙語辭典）中的詞彙翻譯配對選擇適當的詞義，進而擴充現有雙語WordNet的詞彙涵蓋範圍。此方法對給定之詞彙與翻譯，自動由廣義而狹義尋訪WordNet中的下位詞階層（hyponym hierarchy），透過逐步選擇適當的下位詞類別以減低其詞意歧異度。我們為每個可能出現詞義分歧的下位詞階層節點建構對應的分類模型；我們使用現有的雙語WordNet進行訓練，使各模型學習其下位詞詞彙翻譯的共同特徵，使得在執行階段，分類器可以透過特徵比對，選擇較為適切的下位詞節點。此外，我們也建構一個分類篩選模型，用以濾除較為不可能的詞義，提高系統的速度與精確度。實驗結果顯示，此系統能夠有效的為給定詞彙翻譯選擇正確的WordNet詞義。此分類結果將可當作系統的訓練資料，重新訓練分類模型，亦或將其與機器翻譯系統結合，使得機器翻譯系統能夠更精確的根據語意產生翻譯。

關鍵字

翻譯詞意選擇；字詞歧異辨識；雙語WordNet ；最大熵值模型

並列摘要

We introduce a method for leaning to assign word senses to bilingual translation pairs. In our approach, this problem is transformed into a problem on how to navigate through a sense network (e.g., WordNet) aimed at relating the features of translations to the sense nodes in the network. The method involves automatically constructing classification models for each branched nodes in the sense network and learning to reject less probable sense categories for the translations based on the translation characteristics of semantically related word groups (e.g., words in a lexical category). At run-time, given translations are expanded with their synonyms and the sense ambiguity is resolved according to the trained classification models. Evaluation shows that the method significantly outperforms the strong baseline of assigning most frequent sense to the translation pairs. Our method effectively determines adequate word senses for given word-translation pairs, suggesting the possibility of using our methods as computer-assisted tool for lexicography or of using our method to assist machine translation systems in word selection.

並列關鍵字

word translation classification ； word sense disambiguation ； bilingual WordNet ； maximum entropy model

參考文獻

Hsieh, C.-T. (2000). Semi-Automatic Construction of Chinese WordNet - Using Class-based Translation Model.

Huang, C.-C., Tseng, C.-H., Kao, K. H., and Chang, J. S. (2008). A Thesaurus-based Semantic Classification of English Collocations. ROCLING 2008, (pp. 38-52). Taipei.

Agirre, E., and Rigau, G. (1996). Word Sense Disambiguation using Conceptual Density. 16th Conference on Computational Linguistics, (pp. 16-22). Copenhagen.

Black, E. W. (1988). An Experiment in Computational Discrimination of English Word Senses. IBM Journal of Research and Development , 185-194.

Diab, M., and Resnik, P. (2002). An Unsupervised Method for Word Sense Tagging using Parallel Corpora. the 40th Annual Meeting of the Association for Computational Linguistics (ACL), (pp. 255-262). Philadelphia.

國際替代計量

透過階層式翻譯分類擴充雙語WordNet

全文下載

主題瀏覽