Independent Fusion of Words and Image for Multimodal Machine Translation

Ma, Junteng; Qin, Shihao; Chen, Minping; Li, Xia

doi:10.1007/978-981-15-1721-1_4

Junteng Ma⁸,
Shihao Qin⁸,
Minping Chen⁸ &
…
Xia Li^8,9

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1104))

Included in the following conference series:

China Conference on Machine Translation

401 Accesses

Abstract

Multimodal machine translation which combines visual information of image has become one of the research hotpots in recent years. Most of the existing works project the image feature into the text semantic space and merged into the model in different ways. Actually, different source words may capture different visual information. Therefore, we propose a multimodal neural machine translation (MNMT) model that integrates the words and visual information of image independently. The word itself and different key similarity information of an image are independently fused into the text semantics of the word, thereby assisting and enhancing the textual semantic and corresponding visual information of different words. And then we use them for the calculation of the context vector of the attention of decoder of our model. In this paper, different experiments are carried out on the original English-German sentence pairs of the multimodal machine translation dataset, Multi30k, and the Indonesian-Chinese sentence pairs which is manually annotated by human. Compared with the existing MNMT model based on RNN, our model has a better performance and proves the effectiveness of it.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Baltrusaitis, T., Ahuja, C., Morency, L.: Multimodal machine learning: a survey and taxonomy. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 423–443 (2018)
Article Google Scholar
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3156–3164 (2014)
Google Scholar
Huang, P-Y., Liu, F., Shiang, S-R., Oh, J., Dyer, C.: Attention-based multimodal neural machine translation. In: Proceedings of the First Conference on Machine Translation, pp. 639–645 (2016)
Google Scholar
Calixto, I., Liu, Q., Campbell, N.: Incorporating global visual features into attention-based neural machine translation. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 992–1003 (2017)
Google Scholar
Calixto, I., Liu, Q., Campbell, N.: Doubly-Attentive decoder for multi-modal neural machine translation. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp. 1913–1924 (2017)
Google Scholar
Caglayan, O., Aransa, W., Bardet, A., Garcia-Martinez, M.,: Bougares, F., Barrault, L.: LIUM-CVC submissions for WMT 2017 multimodal translation task. In: Proceedings of the Conference on Machine Translation (WMT), pp. 432–439 (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Russakovsky, O., Deng, J., Su, H., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Luong, M., Pham, H., Manning, C.: Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1412–1421 (2015)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002)
Google Scholar
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (ICLR) (2015)
Google Scholar
Sutskever, I., Vinyals, O., Le, V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems (2014)
Google Scholar
Elliott, D., Kadar, A.: Imagination improves multimodal translation. In: Proceedings of the The 8th International Joint Conference on Natural Language Processing, pp. 130–141 (2017)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (ICLR), pp. 1–14 (2015)
Google Scholar
Liu, Q.: Survey of statistical machine translation. J. Chin. Inf. Process. 17(4), 2–13 (2003)
Google Scholar
Brown, F., Cocke, J., Pietra, S., et al.: A statistical approach to machine translation. Comput. Linguist. 16, 79–85 (1990)
Google Scholar
Brown, F., Pietra, S., Pietra, V.: The mathematics of statistical machine translation: parameter estimation. Comput. Linguist. 19(2), 263–311 (1993)
Google Scholar
Och, F., Ney, H.: Discriminative training and maximum entropy models for statistical machine translation. In: Association for Computational Linguistics (2012)
Google Scholar
Liu, Y., Wang, K., Zong, C., Su, K.: A unified framework and models for integrating translation memory into phrase-based statistical machine translation. Comput. Speech Lang. 54, 176–206 (2019)
Article Google Scholar
Zhang, J., Zong, C.: Learning a phrase-based translation model from monolingual data with application to domain adaptation. In: The 51st Annual Meeting of the Association for Computational Linguistics (ACL) (2013)
Google Scholar
Tu, Z., Liu Y, Hwang, Y., Liu, Q., Lin, S.: Dependency forest for statistical machine translation. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 1092–1100 (2010)
Google Scholar
Kalchbrenner, N., Blunsom, P.: Recurrent continuous translation models. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1700–1709 (2013)
Google Scholar
Cho, K., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734 (2014)
Google Scholar
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: International Conference on Learning Representations (ICLR) (2015)
Google Scholar
Bradbury, J., Merity, S., Xiong, C., Socher, R.: Quasi-recurrent neural networks. In: Association for the Advancement of Artificial Intelligence (2017)
Google Scholar
Kalchbrenner, N., Espeholt, L., Karen, S., Oord, A., Graves, A., Kavukcuoglu, K.: Neural Machine Translation in Linear Time. arXiv preprint https://arxiv.org/abs/1610.10099 (2016)
Gehring, J., Auli, M., Grangier, D., Yarats, D, Dauphin, Y.: Convolutional sequence to sequence learning. In: Proceeding ICML 2017 Proceedings of the 34th International Conference on Machine Learning (2017)
Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention Is All You Need. arXiv preprint https://arxiv.org/abs/1706.03762 (2017)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (NIPS) (2015)
Google Scholar

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China (61976062) and the Science and Technology Program of Guangzhou, China (201904010303).

Author information

Authors and Affiliations

School of Information Science and Technology, Guangdong University of Foreign Studies, Guangzhou, China
Junteng Ma, Shihao Qin, Minping Chen & Xia Li
Guangzhou Key Laboratory of Multilingual Intelligent Processing, Guangzhou, China
Xia Li

Authors

Junteng Ma
View author publications
You can also search for this author in PubMed Google Scholar
Shihao Qin
View author publications
You can also search for this author in PubMed Google Scholar
Minping Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xia Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xia Li .

Editor information

Editors and Affiliations

Nanjing University, Nanjing, China
Shujian Huang
Didi Labs, University of Southern California, Marina Del Rey, CA, USA
Kevin Knight

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ma, J., Qin, S., Chen, M., Li, X. (2019). Independent Fusion of Words and Image for Multimodal Machine Translation. In: Huang, S., Knight, K. (eds) Machine Translation. CCMT 2019. Communications in Computer and Information Science, vol 1104. Springer, Singapore. https://doi.org/10.1007/978-981-15-1721-1_4

Download citation

DOI: https://doi.org/10.1007/978-981-15-1721-1_4
Published: 23 November 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-1720-4
Online ISBN: 978-981-15-1721-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics