Translate Meanings, Not Just Words: IdiomKB’s Role in Optimizing Idiomatic Translation with Language Models

Authors

  • Shuang Li Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University
  • Jiangjie Chen Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University
  • Siyu Yuan School of Data Science, Fudan University
  • Xinyi Wu Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University
  • Hao Yang Huawei Translation Services Center
  • Shimin Tao Huawei Translation Services Center
  • Yanghua Xiao Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University Fudan-Aishu Cognitive Intelligence Joint Research Center

DOI:

https://doi.org/10.1609/aaai.v38i17.29817

Keywords:

NLP: Machine Translation, Multilinguality, Cross-Lingual NLP, NLP: (Large) Language Models

Abstract

To translate well, machine translation (MT) systems and general-purposed language models (LMs) need a deep understanding of both source and target languages and cultures. Therefore, idioms, with their non-compositional nature, pose particular challenges for Transformer-based systems, as literal translations often miss the intended meaning. Traditional methods, which replace idioms using existing knowledge bases (KBs), often lack scale and context-awareness. Addressing these challenges, our approach prioritizes context-awareness and scalability, allowing for offline storage of idioms in a manageable KB size. This ensures efficient serving with smaller models and provides a more comprehensive understanding of idiomatic expressions. We introduce a multilingual idiom KB (IdiomKB) developed using large LMs to address this. This KB facilitates better translation by smaller models, such as BLOOMZ (7.1B), Alpaca (7B), and InstructGPT (6.7B), by retrieving idioms' figurative meanings. We present a novel, GPT-4-powered metric for human-aligned evaluation, demonstrating that IdiomKB considerably boosts model performance. Human evaluations further validate our KB's quality.

Published

2024-03-24

How to Cite

Li, S., Chen, J., Yuan, S., Wu, X., Yang, H., Tao, S., & Xiao, Y. (2024). Translate Meanings, Not Just Words: IdiomKB’s Role in Optimizing Idiomatic Translation with Language Models. Proceedings of the AAAI Conference on Artificial Intelligence, 38(17), 18554-18563. https://doi.org/10.1609/aaai.v38i17.29817

Issue

Section

AAAI Technical Track on Natural Language Processing II