Abstract
The Transformer is commonly used in Neural Machine Translation (NMT), but it faces issues with over-parameterization in low-resource settings. This means that simply increasing the model parameters significantly will not lead to improved performance. In this study, we propose a graph-based approach that slightly increases the parameters while significantly outperforming the scaled version of the Transformer. We accomplish this by utilizing Graph Neural Networks to encode Universal Conceptual Cognitive Annotation (UCCA), allowing the linguistic features of UCCA to be incorporated into the word embeddings. This improves the performance of the NMT system since the word embedding is now more capable and informative. Experimental results demonstrate that the proposed method outperforms the scaled Transformer model by +0.4, +0.41, and +0.33 BLEU, respectively, in English-Vietnamese/French/Czech datasets. Furthermore, this method reduces the number of parameters by 47% when compared to the scaled Transformer. A thorough analysis of error patterns reveals that the proposed method provides structural awareness to translation systems. Our code is available at: https://github.com/nqbinh17/UCCA_GNN.
- Omri Abend and Ari Rappoport. 2013. Universal Conceptual Cognitive Annotation (UCCA). In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Sofia, Bulgaria, 228–238. https://www.aclweb.org/anthology/P13-1023Google Scholar
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1409.0473Google Scholar
- Laura Banarescu, Claire Bonial, Shu Cai, Madalina Georgescu, Kira Griffitt, Ulf Hermjakob, Kevin Knight, Philipp Koehn, Martha Palmer, and Nathan Schneider. 2013. Abstract Meaning Representation for Sembanking. In Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse. Association for Computational Linguistics, Sofia, Bulgaria, 178–186. https://aclanthology.org/W13-2322Google Scholar
- Mauro Cettolo, Christian Girardi, and Marcello Federico. 2012. WIT3: Web Inventory of Transcribed and Translated Talks. In Proceedings of the 16th Annual conference of the European Association for Machine Translation. European Association for Machine Translation, Trento, Italy, 261–268. https://www.aclweb.org/anthology/2012.eamt-1.60Google Scholar
- William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive Representation Learning on Large Graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 1025–1035.Google ScholarDigital Library
- Daniel Hershcovich, Omri Abend, and Ari Rappoport. 2017. A Transition-Based Directed Acyclic Graph Parser for UCCA. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vancouver, Canada, 1127–1138. https://doi.org/10.18653/v1/P17-1104Google ScholarCross Ref
- Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=SJU4ayYglGoogle Scholar
- Philipp Koehn. 2004. Statistical Significance Tests for Machine Translation Evaluation. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Barcelona, Spain, 388–395. https://www.aclweb.org/anthology/W04-3250Google Scholar
- Marco Kuhlmann and Stephan Oepen. 2016. Squibs: Towards a Catalogue of Linguistic Graph Banks. Computational Linguistics 42, 4 (Dec. 2016), 819–827. https://doi.org/10.1162/COLI_a_00268Google ScholarDigital Library
- Changmao Li and Jeffrey Flanigan. 2022. Improving Neural Machine Translation with the Abstract Meaning Representation by Combining Graph and Sequence Transformers. In Proceedings of the 2nd Workshop on Deep Learning on Graphs for Natural Language Processing (DLG4NLP 2022). Association for Computational Linguistics, Seattle, Washington, 12–21. https://doi.org/10.18653/v1/2022.dlg4nlp-1.2Google ScholarCross Ref
- Binh Nguyen, Long Nguyen, and Dien Dinh. 2022. Multi-level Community-awareness Graph Neural Networks for Neural Machine Translation. In Proceedings of the 29th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Gyeongju, Republic of Korea, 5021–5028. https://aclanthology.org/2022.coling-1.444Google Scholar
- Long H. B. Nguyen, Viet H. Pham, and Dien Dinh. 2021. Improving Neural Machine Translation with AMR Semantic Graphs. Mathematical Problems in Engineering 2021, 9939389 (2021). https://doi.org/10.1155/2021/9939389Google Scholar
- Stephan Oepen and Jan Tore Lønning. 2006. Discriminant-Based MRS Banking. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06). European Language Resources Association (ELRA), Genoa, Italy. http://www.lrec-conf.org/proceedings/lrec2006/pdf/364_pdf.pdfGoogle Scholar
- Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, 311–318. https://doi.org/10.3115/1073083.1073135Google ScholarDigital Library
- Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural Machine Translation of Rare Words with Subword Units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, 1715–1725. https://doi.org/10.18653/v1/P16-1162Google ScholarCross Ref
- Yunsheng Shi, Zhengjie Huang, Shikun Feng, Hui Zhong, Wenjing Wang, and Yu Sun. 2021. Masked Label Prediction: Unified Message Passing Model for Semi-Supervised Classification. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, Zhi-Hua Zhou (Ed.). International Joint Conferences on Artificial Intelligence Organization, 1548–1554. https://doi.org/10.24963/ijcai.2021/214 Main Track.Google ScholarCross Ref
- Linfeng Song, Daniel Gildea, Yue Zhang, Zhiguo Wang, and Jinsong Su. 2019. Semantic Neural Machine Translation Using AMR. Transactions of the Association for Computational Linguistics 7 (March 2019), 19–31. https://doi.org/10.1162/tacl_a_00252Google ScholarCross Ref
- Linfeng Song, Ante Wang, Jinsong Su, Yue Zhang, Kun Xu, Yubin Ge, and Dong Yu. 2020. Structural Information Preserving for Graph-to-Text Generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 7987–7998. https://doi.org/10.18653/v1/2020.acl-main.712Google ScholarCross Ref
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdfGoogle Scholar
- Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. In International Conference on Learning Representations. https://openreview.net/forum?id=rJXMpikCZGoogle Scholar
- Chuhan Wu, Fangzhao Wu, Tao Qi, and Yongfeng Huang. 2020. Improving Attention Mechanism with Query-Value Interaction. CoRR abs/2010.03766(2020). arXiv:2010.03766 https://arxiv.org/abs/2010.03766Google Scholar
- Yanbin Zhao, Lu Chen, Zhi Chen, Ruisheng Cao, Su Zhu, and Kai Yu. 2020. Line Graph Enhanced AMR-to-Text Generation with Mix-Order Graph Attention Networks. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 732–741. https://doi.org/10.18653/v1/2020.acl-main.67Google ScholarCross Ref
- Lei Zhong, Juan Cao, Qiang Sheng, Junbo Guo, and Ziang Wang. 2020. Integrating Semantic and Structural Information with Graph Convolutional Network for Controversy Detection. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 515–526. https://doi.org/10.18653/v1/2020.acl-main.49Google ScholarCross Ref
Index Terms
- Exploring Graph-based Transformer Encoder for Low-Resource Neural Machine Translation
Recommendations
Enhancing low-resource neural machine translation with syntax-graph guided self-attention
AbstractMost neural machine translation (NMT) models only rely on parallel sentence pairs, while the performance drops sharply in low-resource cases, as the models fail to mine the linguistry of the corpus. Incorporating prior monolingual ...
Highlights- We propose a syntax-aware self-attention that integrates syntactic knowledge.
- ...
Gated Self-attentive Encoder for Neural Machine Translation
Knowledge Science, Engineering and ManagementAbstractNeural Machine Translation (NMT) has become a popular technology in recent years, and the RNN-based encoder-decoder model is its actual translation framework. However, it remains a major challenge for RNNs to handle long-range dependencies. To ...
Comments