Adversarial Image Caption Generator Network

Dehaqi, Ali Mollaahmadi; Seydi, Vahid; Madadi, Yeganeh

doi:10.1007/s42979-021-00486-y

Adversarial Image Caption Generator Network

Original Research
Published: 31 March 2021

Volume 2, article number 182, (2021)
Cite this article

SN Computer Science Aims and scope Submit manuscript

784 Accesses
1 Citation
Explore all metrics

Abstract

Image captioning is a task to make an image description, which needs recognizing the important attributes and also their relationships in the image. This task requires to generate semantically and syntactically correct sentences. Most image captioning models are based on RNN and MLE methods, but we propose a novel model based on GAN networks where it generates the caption of the image through the representation of the image by utilizing the generator adversarial network and it does not need any secondary learning algorithm like policy gradient. Due to the complexity of benchmark datasets such as Flickr and Coco, in both volume and complexity, we introduce a new dataset and perform the experiments on it. The experimental results show the effectiveness of our model compared to the state-of-the-art image captioning methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5

A Novel Image Captioning Method Based on Generative Adversarial Networks

Generative image captioning in Urdu using deep learning

Article Open access 10 April 2023

Towards Generating Stylized Image Captions via Adversarial Training

References

Agrawal A, Lu J, Antol S, Mitchell M, Zitnick CL, Parikh D, Batra D. VQA: visual question answering. Int J Comput Vis. 2016;1(123):4.
MathSciNet Google Scholar
Das A, Kottur S, Moura JM, Lee S, Batra D. Proceedings of the IEEE international conference on computer vision. 2017;2951–60.
Bahdanau D, Cho K, Bengio Y. In: 3rd International Conference on Learning Representations, ICLR 2015. 2015.
Cho K, van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y. in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2014; pp. 1724–1734.
LeCun Y, Haffner P, Bottou L, Bengio Y. In: Shape, contour and grouping in computer vision Springer, 1999; 319–345
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735.
Article Google Scholar
Stigler SM, et al. The epic story of maximum likelihood. Stat Sci. 2007;22(4):598.
Article MathSciNet Google Scholar
Bengio S, Vinyals O, Jaitly N, Shazeer N. Advances in Neural Information Processing Systems. 2015;1171–9.
Sutton R.S, Barto A.G. Reinforcement learning: an introduction. 2011.
Hossain MZ, Sohel F, Shiratuddin MF, Laga H. A comprehensive survey of deep learning for image captioning. ACM Comput Surv (CSUR). 2019;51(6):1.
Article Google Scholar
Aker A, Gaizauskas R. In: Proceedings of the 48th annual meeting of the association for computational linguistics (Association for Computational Linguistics), 2010; 1250–1258.
Elliott D, Keller F. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing 2013; 1292–1302.
Kuznetsova P, Ordonez V, Berg A.C, Berg T.L, Choi Y. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1 (Association for Computational Linguistics), 2012; 359–368.
Mitchell M, Han X, Dodge J, Mensch A, Goyal A, Berg A, Yamaguchi K, Berg T, Stratos K, Daumé III H. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (Association for Computational Linguistics), 2012; 747–756.
Kuznetsova P, Ordonez V, Berg TL, Choi Y. Treetalk: composition and compression of trees for image descriptions. Trans Assoc Comput Ling. 2014;2:351.
Google Scholar
Gong Y, Wang L, Hodosh M, Hockenmaier J, Lazebnik S. In: European conference on computer vision (Springer), 2014; 529–545.
Hodosh M, Young P, Hockenmaier J. Framing image description as a ranking task: Data, models and evaluation metrics. J Artif Intell Res. 2013;47:853.
Article MathSciNet Google Scholar
Ordonez V, Kulkarni G, Berg TL. Advances in neural information processing systems. 2011;1143–51.
Sun C, Gan C, Nevatia R. Proceedings of the IEEE international conference on computer vision. 2015;2596–604.
Kiros R, Salakhutdinov R, Zemel R.S. Unifying visual-semantic embeddings with multimodal neural language models. arXiv preprint arXiv:1411.2539 2014.
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y. International conference on machine learning. 2015;2048–57.
Yao T, Pan Y, Li Y, Qiu Z, Mei T. Proceedings of the IEEE International Conference on Computer Vision. 2017;4894–902.
You Q, Jin H, Wang Z, Fang C, Luo J. Proceedings of the IEEE conference on computer vision and pattern recognition. 2016;4651–9.
Vinyals O, Toshev A, Bengio S, Erhan D. Show and tell: Lessons learned from the 2015 mscoco image captioning challenge. IEEE Transa Pattern Anal Mach Intell. 2016;39(4):652.
Article Google Scholar
Shi H, Li P, Wang B, Wang Z. In: Proceedings of the 10th International Conference on Internet Multimedia Computing and Service. 2018; 1–5.
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Advances in neural information processing systems. 2014;2672–80.
Yu L, Zhang W, Wang J, Yu Y. In: Thirty-first AAAI conference on artificial intelligence 2017.
Fedus W, Goodfellow I, Dai A.M. In: International Conference on Learning Representations 2018.
Kramer MA. Nonlinear principal component analysis using autoassociative neural networks. AIChE J. 1991;37(2):233.
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton GE. Advances in neural information processing systems. 2012;1097–105.
Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 2013.
Ioffe S, Szegedy C. International Conference on Machine Learning. 2015;448–56.
Xu B, Wang N, Chen T, Li M. Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853 2015.
Young P, Lai A, Hodosh M, Hockenmaier J. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Trans Assoc Comput Ling. 2014;2:67.
Google Scholar
Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S,. Saenko K, Darrell T. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2015; pp. 2625–2634.
Chen X, Fang H, Lin T.Y, Vedantam R, Gupta S, Dollár P, Zitnick C.L. Microsoft coco captions: Data collection and evaluation server. arXiv preprint arXiv:1504.00325 2015.
Maaten LVdD, Hinton G. Visualizing data using t-SNE. J Mach Learn Res. 2008;9:2579.
MATH Google Scholar
Budhkar A, Vishnubhotla K, Hossain S, Rudzicz F. In: Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019) 2019; 15–26.
Lu J, Yang J, Batra D, Parikh D. Proceedings of the IEEE conference on computer vision and pattern recognition. 2018;7219–28.
He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. CoRR abs/1512.03385 2015. arXiv:abs/1512.03385.

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Faculty of Technical and Engineering, South Tehran Branch, Islamic Azad University, Tehran, Iran
Ali Mollaahmadi Dehaqi, Vahid Seydi & Yeganeh Madadi
University of Tehran, Tehran, Iran
Yeganeh Madadi

Authors

Ali Mollaahmadi Dehaqi
View author publications
You can also search for this author in PubMed Google Scholar
Vahid Seydi
View author publications
You can also search for this author in PubMed Google Scholar
Yeganeh Madadi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vahid Seydi.

Ethics declarations

Human participants or animals

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dehaqi, A.M., Seydi, V. & Madadi, Y. Adversarial Image Caption Generator Network. SN COMPUT. SCI. 2, 182 (2021). https://doi.org/10.1007/s42979-021-00486-y

Download citation

Received: 15 September 2020
Accepted: 23 January 2021
Published: 31 March 2021
DOI: https://doi.org/10.1007/s42979-021-00486-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adversarial Image Caption Generator Network

Abstract

Access this article

Similar content being viewed by others

A Novel Image Captioning Method Based on Generative Adversarial Networks

Generative image captioning in Urdu using deep learning

Towards Generating Stylized Image Captions via Adversarial Training

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Human participants or animals

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Adversarial Image Caption Generator Network

Abstract

Access this article

Similar content being viewed by others

A Novel Image Captioning Method Based on Generative Adversarial Networks

Generative image captioning in Urdu using deep learning

Towards Generating Stylized Image Captions via Adversarial Training

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Human participants or animals

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation