Abstract
Multimodal machine translation which combines visual information of image has become one of the research hotpots in recent years. Most of the existing works project the image feature into the text semantic space and merged into the model in different ways. Actually, different source words may capture different visual information. Therefore, we propose a multimodal neural machine translation (MNMT) model that integrates the words and visual information of image independently. The word itself and different key similarity information of an image are independently fused into the text semantics of the word, thereby assisting and enhancing the textual semantic and corresponding visual information of different words. And then we use them for the calculation of the context vector of the attention of decoder of our model. In this paper, different experiments are carried out on the original English-German sentence pairs of the multimodal machine translation dataset, Multi30k, and the Indonesian-Chinese sentence pairs which is manually annotated by human. Compared with the existing MNMT model based on RNN, our model has a better performance and proves the effectiveness of it.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Baltrusaitis, T., Ahuja, C., Morency, L.: Multimodal machine learning: a survey and taxonomy. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 423–443 (2018)
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3156–3164 (2014)
Huang, P-Y., Liu, F., Shiang, S-R., Oh, J., Dyer, C.: Attention-based multimodal neural machine translation. In: Proceedings of the First Conference on Machine Translation, pp. 639–645 (2016)
Calixto, I., Liu, Q., Campbell, N.: Incorporating global visual features into attention-based neural machine translation. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 992–1003 (2017)
Calixto, I., Liu, Q., Campbell, N.: Doubly-Attentive decoder for multi-modal neural machine translation. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp. 1913–1924 (2017)
Caglayan, O., Aransa, W., Bardet, A., Garcia-Martinez, M.,: Bougares, F., Barrault, L.: LIUM-CVC submissions for WMT 2017 multimodal translation task. In: Proceedings of the Conference on Machine Translation (WMT), pp. 432–439 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Russakovsky, O., Deng, J., Su, H., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Luong, M., Pham, H., Manning, C.: Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1412–1421 (2015)
Papineni, K., Roukos, S., Ward, T., Zhu, W.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002)
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (ICLR) (2015)
Sutskever, I., Vinyals, O., Le, V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems (2014)
Elliott, D., Kadar, A.: Imagination improves multimodal translation. In: Proceedings of the The 8th International Joint Conference on Natural Language Processing, pp. 130–141 (2017)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (ICLR), pp. 1–14 (2015)
Liu, Q.: Survey of statistical machine translation. J. Chin. Inf. Process. 17(4), 2–13 (2003)
Brown, F., Cocke, J., Pietra, S., et al.: A statistical approach to machine translation. Comput. Linguist. 16, 79–85 (1990)
Brown, F., Pietra, S., Pietra, V.: The mathematics of statistical machine translation: parameter estimation. Comput. Linguist. 19(2), 263–311 (1993)
Och, F., Ney, H.: Discriminative training and maximum entropy models for statistical machine translation. In: Association for Computational Linguistics (2012)
Liu, Y., Wang, K., Zong, C., Su, K.: A unified framework and models for integrating translation memory into phrase-based statistical machine translation. Comput. Speech Lang. 54, 176–206 (2019)
Zhang, J., Zong, C.: Learning a phrase-based translation model from monolingual data with application to domain adaptation. In: The 51st Annual Meeting of the Association for Computational Linguistics (ACL) (2013)
Tu, Z., Liu Y, Hwang, Y., Liu, Q., Lin, S.: Dependency forest for statistical machine translation. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 1092–1100 (2010)
Kalchbrenner, N., Blunsom, P.: Recurrent continuous translation models. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1700–1709 (2013)
Cho, K., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734 (2014)
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: International Conference on Learning Representations (ICLR) (2015)
Bradbury, J., Merity, S., Xiong, C., Socher, R.: Quasi-recurrent neural networks. In: Association for the Advancement of Artificial Intelligence (2017)
Kalchbrenner, N., Espeholt, L., Karen, S., Oord, A., Graves, A., Kavukcuoglu, K.: Neural Machine Translation in Linear Time. arXiv preprint https://arxiv.org/abs/1610.10099 (2016)
Gehring, J., Auli, M., Grangier, D., Yarats, D, Dauphin, Y.: Convolutional sequence to sequence learning. In: Proceeding ICML 2017 Proceedings of the 34th International Conference on Machine Learning (2017)
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention Is All You Need. arXiv preprint https://arxiv.org/abs/1706.03762 (2017)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (NIPS) (2015)
Acknowledgements
This work is supported by National Natural Science Foundation of China (61976062) and the Science and Technology Program of Guangzhou, China (201904010303).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Ma, J., Qin, S., Chen, M., Li, X. (2019). Independent Fusion of Words and Image for Multimodal Machine Translation. In: Huang, S., Knight, K. (eds) Machine Translation. CCMT 2019. Communications in Computer and Information Science, vol 1104. Springer, Singapore. https://doi.org/10.1007/978-981-15-1721-1_4
Download citation
DOI: https://doi.org/10.1007/978-981-15-1721-1_4
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-1720-4
Online ISBN: 978-981-15-1721-1
eBook Packages: Computer ScienceComputer Science (R0)