skip to main content
research-article
Free Access
Just Accepted

Exploring Graph-based Transformer Encoder for Low-Resource Neural Machine Translation

Online AM:25 May 2023Publication History
Skip Abstract Section

Abstract

The Transformer is commonly used in Neural Machine Translation (NMT), but it faces issues with over-parameterization in low-resource settings. This means that simply increasing the model parameters significantly will not lead to improved performance. In this study, we propose a graph-based approach that slightly increases the parameters while significantly outperforming the scaled version of the Transformer. We accomplish this by utilizing Graph Neural Networks to encode Universal Conceptual Cognitive Annotation (UCCA), allowing the linguistic features of UCCA to be incorporated into the word embeddings. This improves the performance of the NMT system since the word embedding is now more capable and informative. Experimental results demonstrate that the proposed method outperforms the scaled Transformer model by +0.4, +0.41, and +0.33 BLEU, respectively, in English-Vietnamese/French/Czech datasets. Furthermore, this method reduces the number of parameters by 47% when compared to the scaled Transformer. A thorough analysis of error patterns reveals that the proposed method provides structural awareness to translation systems. Our code is available at: https://github.com/nqbinh17/UCCA_GNN.

References

  1. Omri Abend and Ari Rappoport. 2013. Universal Conceptual Cognitive Annotation (UCCA). In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Sofia, Bulgaria, 228–238. https://www.aclweb.org/anthology/P13-1023Google ScholarGoogle Scholar
  2. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1409.0473Google ScholarGoogle Scholar
  3. Laura Banarescu, Claire Bonial, Shu Cai, Madalina Georgescu, Kira Griffitt, Ulf Hermjakob, Kevin Knight, Philipp Koehn, Martha Palmer, and Nathan Schneider. 2013. Abstract Meaning Representation for Sembanking. In Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse. Association for Computational Linguistics, Sofia, Bulgaria, 178–186. https://aclanthology.org/W13-2322Google ScholarGoogle Scholar
  4. Mauro Cettolo, Christian Girardi, and Marcello Federico. 2012. WIT3: Web Inventory of Transcribed and Translated Talks. In Proceedings of the 16th Annual conference of the European Association for Machine Translation. European Association for Machine Translation, Trento, Italy, 261–268. https://www.aclweb.org/anthology/2012.eamt-1.60Google ScholarGoogle Scholar
  5. William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive Representation Learning on Large Graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 1025–1035.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Daniel Hershcovich, Omri Abend, and Ari Rappoport. 2017. A Transition-Based Directed Acyclic Graph Parser for UCCA. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vancouver, Canada, 1127–1138. https://doi.org/10.18653/v1/P17-1104Google ScholarGoogle ScholarCross RefCross Ref
  7. Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=SJU4ayYglGoogle ScholarGoogle Scholar
  8. Philipp Koehn. 2004. Statistical Significance Tests for Machine Translation Evaluation. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Barcelona, Spain, 388–395. https://www.aclweb.org/anthology/W04-3250Google ScholarGoogle Scholar
  9. Marco Kuhlmann and Stephan Oepen. 2016. Squibs: Towards a Catalogue of Linguistic Graph Banks. Computational Linguistics 42, 4 (Dec. 2016), 819–827. https://doi.org/10.1162/COLI_a_00268Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Changmao Li and Jeffrey Flanigan. 2022. Improving Neural Machine Translation with the Abstract Meaning Representation by Combining Graph and Sequence Transformers. In Proceedings of the 2nd Workshop on Deep Learning on Graphs for Natural Language Processing (DLG4NLP 2022). Association for Computational Linguistics, Seattle, Washington, 12–21. https://doi.org/10.18653/v1/2022.dlg4nlp-1.2Google ScholarGoogle ScholarCross RefCross Ref
  11. Binh Nguyen, Long Nguyen, and Dien Dinh. 2022. Multi-level Community-awareness Graph Neural Networks for Neural Machine Translation. In Proceedings of the 29th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Gyeongju, Republic of Korea, 5021–5028. https://aclanthology.org/2022.coling-1.444Google ScholarGoogle Scholar
  12. Long H. B. Nguyen, Viet H. Pham, and Dien Dinh. 2021. Improving Neural Machine Translation with AMR Semantic Graphs. Mathematical Problems in Engineering 2021, 9939389 (2021). https://doi.org/10.1155/2021/9939389Google ScholarGoogle Scholar
  13. Stephan Oepen and Jan Tore Lønning. 2006. Discriminant-Based MRS Banking. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06). European Language Resources Association (ELRA), Genoa, Italy. http://www.lrec-conf.org/proceedings/lrec2006/pdf/364_pdf.pdfGoogle ScholarGoogle Scholar
  14. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, 311–318. https://doi.org/10.3115/1073083.1073135Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural Machine Translation of Rare Words with Subword Units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, 1715–1725. https://doi.org/10.18653/v1/P16-1162Google ScholarGoogle ScholarCross RefCross Ref
  16. Yunsheng Shi, Zhengjie Huang, Shikun Feng, Hui Zhong, Wenjing Wang, and Yu Sun. 2021. Masked Label Prediction: Unified Message Passing Model for Semi-Supervised Classification. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, Zhi-Hua Zhou (Ed.). International Joint Conferences on Artificial Intelligence Organization, 1548–1554. https://doi.org/10.24963/ijcai.2021/214 Main Track.Google ScholarGoogle ScholarCross RefCross Ref
  17. Linfeng Song, Daniel Gildea, Yue Zhang, Zhiguo Wang, and Jinsong Su. 2019. Semantic Neural Machine Translation Using AMR. Transactions of the Association for Computational Linguistics 7 (March 2019), 19–31. https://doi.org/10.1162/tacl_a_00252Google ScholarGoogle ScholarCross RefCross Ref
  18. Linfeng Song, Ante Wang, Jinsong Su, Yue Zhang, Kun Xu, Yubin Ge, and Dong Yu. 2020. Structural Information Preserving for Graph-to-Text Generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 7987–7998. https://doi.org/10.18653/v1/2020.acl-main.712Google ScholarGoogle ScholarCross RefCross Ref
  19. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol.  30. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdfGoogle ScholarGoogle Scholar
  20. Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. In International Conference on Learning Representations. https://openreview.net/forum?id=rJXMpikCZGoogle ScholarGoogle Scholar
  21. Chuhan Wu, Fangzhao Wu, Tao Qi, and Yongfeng Huang. 2020. Improving Attention Mechanism with Query-Value Interaction. CoRR abs/2010.03766(2020). arXiv:2010.03766 https://arxiv.org/abs/2010.03766Google ScholarGoogle Scholar
  22. Yanbin Zhao, Lu Chen, Zhi Chen, Ruisheng Cao, Su Zhu, and Kai Yu. 2020. Line Graph Enhanced AMR-to-Text Generation with Mix-Order Graph Attention Networks. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 732–741. https://doi.org/10.18653/v1/2020.acl-main.67Google ScholarGoogle ScholarCross RefCross Ref
  23. Lei Zhong, Juan Cao, Qiang Sheng, Junbo Guo, and Ziang Wang. 2020. Integrating Semantic and Structural Information with Graph Convolutional Network for Controversy Detection. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 515–526. https://doi.org/10.18653/v1/2020.acl-main.49Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Exploring Graph-based Transformer Encoder for Low-Resource Neural Machine Translation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing Just Accepted
      ISSN:2375-4699
      EISSN:2375-4702
      Table of Contents

      Copyright © 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Online AM: 25 May 2023
      • Accepted: 23 May 2023
      • Revised: 15 May 2023
      • Received: 30 January 2022
      Published in tallip Just Accepted

      Check for updates

      Qualifiers

      • research-article
    • Article Metrics

      • Downloads (Last 12 months)204
      • Downloads (Last 6 weeks)13

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader