skip to main content
research-article

One Type Context Is Not Enough: Global Context-aware Neural Machine Translation

Published:12 November 2022Publication History
Skip Abstract Section

Abstract

How to effectively model global context has been a critical challenge for document-level neural machine translation (NMT). Both preceding and global context have been carefully explored in the sequence-to-sequence (seq2seq) framework. However, previous studies generally map global context into one vector, which is not enough to well represent the entire document since this largely ignores the hierarchy between sentences and words within. In this article, we propose to model global context for source language from both sentence level and word level. Specifically at sentence level, we extract useful global context for the current sentence, while at word level, we compute global context against words within the current sentence. On this basis, both kinds of global context can be appropriately fused before being incorporated into the state-of-the-art seq2seq model, i.e., Transformer. Detailed experimentation on various document-level translation tasks shows that global context at both sentence level and word level significantly improve translation performance. More encouraging, both kinds of global context are complementary. This leads to more improvement when both kinds of global context are used.

REFERENCES

  1. [1] Bahdanau Dzmitry, Cho Kyunghyun, and Bengio Yoshua. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of ICLR.Google ScholarGoogle Scholar
  2. [2] Bawden Rachel, Sennrich Rico, Birch Alexandra, and Haddow Barry. 2018. Evaluating discourse phenomena in neural machine translation. In Proceedings of NAACL. 13041313.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Cettolo Mauro, Girardi Christian, and Federico Marcello. 2012. WIT3: Web inventory of transcribed and translated talks. In Proceedings of EAMT. 261268.Google ScholarGoogle Scholar
  4. [4] Chen Linqing, Li Junhui, and Gong Zhengxian. 2020. Hierarchical global context augmented document-level neural machine translation. In Proceedings of CCL. 434445.Google ScholarGoogle Scholar
  5. [5] Garcia Eva Martínez, España-Bonet Cristina, and Màrquez Lluís. 2015. Document-level machine translation with word vector models. In Proceedings of EAMT. 5966.Google ScholarGoogle Scholar
  6. [6] Gong Zhengxian, Zhang Min, and Zhou Guodong. 2011. Cache-based document-level statistical machine translation. In Proceedings of EMNLP. 909919.Google ScholarGoogle Scholar
  7. [7] Hardmeier Christian, Nivre Joakim, and Tiedemann Jörg. 2012. Document-wide decoding for phrase-based statistical machine translation. In Proceedings of EMNLP-CoNLL. 11791190.Google ScholarGoogle Scholar
  8. [8] Jean Sebastien, Lauly Stanislas, Firat Orhan, and Cho Kyunghyun. 2017. Does neural machine translation benefit from larger context?Computing Research Repository arXiv:1704.05135 (2017).Google ScholarGoogle Scholar
  9. [9] Kang Xiaomian, Zhao Yang, Zhang Jiajun, and Zong Chengqing. 2020. Dynamic context selection for document-level neural machine translation via reinforcement learning. In Proceedings of EMNLP. 22422254.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Kingma Diederik P. and Ba Jimmy. 2015. Adam: A method for stochastic optimization. In Proceedings of ICLR.Google ScholarGoogle Scholar
  11. [11] Klein Guillaume, Kim Yoon, Deng Yuntian, Senellart Jean, and Rush Alexander. 2017. OpenNMT: Open-source toolkit for neural machine translation. In Proceedings of ACL 2017, System Demonstrations. 6772.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Koehn Philipp. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of EMNLP. 388395.Google ScholarGoogle Scholar
  13. [13] Koehn Philipp, Hoang Hieu, Birch Alexandra, Callison-Burch Chris, Federico Marcello, Bertoldi Nicola, Cowan Brooke, Shen Wade, Moran Christine, Zens Richard, Dyer Chris, Bojar Ondřej, Constantin Alexandra, and Herbst Evan. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of ACL 2007, System Demonstrations. 177180.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Kuang Shaohui, Xiong Deyi, Luo Weihua, and Zhou Guodong. 2018. Modeling coherence for neural machine translation with dynamic and topic caches. In Proceedings of COLING. 596606.Google ScholarGoogle Scholar
  15. [15] Lavie Alon and Agarwal Abhaya. 2007. METEOR: An automatic metric for MT evaluation with high levels of correlation with human judgments. In Proceedings of WMT. 228231.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Li Bei, Liu Hui, Wang Ziyang, Jiang Yufan, Xiao Tong, Zhu Jingbo, Liu Tongran, and Li Changliang. 2020. Does multi-encoder help? A case study on context-aware neural machine translation. In Proceedings of ACL. 35123518.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Lin Zhouhan, Feng Minwei, Santos Cicero Nogueira dos, Yu Mo, Xiang Bing, Zhou Bowen, and Bengio Yoshua. 2017. A structured self-attentive sentence embedding. In Proceedings of ICLR.Google ScholarGoogle Scholar
  18. [18] Mace Valentin and Servan Christophe. 2019. Using whole document context in neural machine translation. In Proceedings of IWSLT.Google ScholarGoogle Scholar
  19. [19] Maruf Sameen and Haffari Gholamreza. 2018. Document context neural machine translation with memory networks. In Proceedings of ACL. 12751284.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Maruf Sameen, Martins André F. T., and Haffari Gholamreza. 2019. Selective attention for context-aware neural machine translation. In Proceedings of NAACL. 30923102.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Miculicich Lesly, Ram Dhananjay, Pappas Nikolaos, and Henderson James. 2018. Document-level neural machine translation with hierarchical attention networks. In Proceedings of EMNLP. 29472954.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Papineni Kishore, Roukos Salim, Todd Ward, and Zhu Wei-Jing. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of ACL. 311318.Google ScholarGoogle Scholar
  23. [23] Sennrich Rico, Haddow Barry, and Birch Alexandra. 2016. Neural machine translation of rare words with subword units. In Proceedings of ACL. 17151725.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Sun Zewei, Wang Mingxuan, Zhou Hao, Zhao Chengqi, Huang Shujian, Chen Jiajun, and Li Lei. 2020. Capturing longer context for document-level neural machine translation: A multi-resolutional approach. Computing Research Repository arXiv:2010.08961 (2020).Google ScholarGoogle Scholar
  25. [25] Tan Xin, Zhang Longyin, Xiong Deyi, and Zhou Guodong. 2019. Hierarchical modeling of global context for document-level neural machine translation. In Proceedings of EMNLP-IJCNLP. 15761585.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Tiedemann Jörg and Scherrer Yves. 2017. Neural machine translation with extended context. In Proceedings of the 3rd Workshop on Discourse in Machine Translation. 8292.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Tu Mei, Zhou Yu, and Zong Chengqing. 2014. Enhancing grammatical cohesion: Generating transitional expressions for SMT. In Proceedings of ACL. 850860.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Tu Zhaopeng, Liu Yang, Shi Shuming, and Zhang Tong. 2018. Learning to remember translation history with a continuous cache. Transactions of the Association for Computational Linguistics 6 (2018), 407420.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Lukasz, and Polosukhin Illia. 2017. Attention is all you need. In Proceedings of NIPS. 59986008.Google ScholarGoogle Scholar
  30. [30] Voita Elena, Sennrich Rico, and Titov Ivan. 2019. When a good translation is wrong in context: Context-aware machine translation improves on deixis, ellipsis, and lexical cohesion. In Proceedings of ACL. 11981212.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Voita Elena, Serdyukov Pavel, Sennrich Rico, and Titov Ivan. 2018. Context-aware neural machine translation learns anaphora resolution. In Proceedings of ACL. 12641274.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Wang Longyue, Tu Zhaopeng, Way Andy, and Liu Qun. 2017. Exploiting cross-sentence context for neural machine translation. In Proceedings of EMNLP. 28262831.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Werlen Lesly Miculicich and Popescu-Belis Andrei. 2017. Validation of an automatic metric for the accuracy of pronoun translation (APT). In Proceedings of Workshop on Discourse in Machine Translation. 1725.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Xiong Deyi, Ding Yang, Zhang Min, and Tan Chew Lim. 2013. Lexical chain based cohesion models for document-level statistical machine translation. In Proceedings of EMNLP. 15631573.Google ScholarGoogle Scholar
  35. [35] Xiong Hao, He Zhongjun, Wu Hua, and Wang Haifeng. 2019. Modeling coherence for discourse neural machine translation. In Proceedings of AAAI. 73387345.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Yang Zhengxin, Zhang Jinchao, Meng Fandong, Gu Shuhao, Feng Yang, and Zhou Jie. 2019. Enhancing context modeling with a query-guided capsule network for document-level translation. In Proceedings of EMNLP. 15271537.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Zhang Jiacheng, Luan Huanbo, Sun Maosong, Zhai Feifei, Xu Jingfang, Zhang Min, and Liu Yang. 2018. Improving the transformer translation model with document-level context. In Proceedings of EMNLP. 533542.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Zheng Zaixiang, Yue Xiang, Huang Shujian, Chen Jiajun, and Birch Alexandra. 2020. Towards making the most of context in neural machine translation. In Proceedings of IJCAI. 39833989.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. One Type Context Is Not Enough: Global Context-aware Neural Machine Translation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 21, Issue 6
      November 2022
      372 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3568970
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 November 2022
      • Online AM: 31 March 2022
      • Accepted: 12 March 2022
      • Revised: 7 March 2022
      • Received: 8 August 2021
      Published in tallip Volume 21, Issue 6

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format