Next-LSTM: a novel LSTM-based image captioning technique

Singh, Priya; Kumar, Chandan; Kumar, Ayush

doi:10.1007/s13198-023-01956-7

Next-LSTM: a novel LSTM-based image captioning technique

ORIGINAL ARTICLE
Published: 21 June 2023

Volume 14, pages 1492–1503, (2023)
Cite this article

International Journal of System Assurance Engineering and Management Aims and scope Submit manuscript

260 Accesses
Explore all metrics

Abstract

Recently, image captioning has evolved into an immensely popular area in the field of Computer Vision. Research in this area is active and various Machine learning-based image captioning models have been proposed in the literature. It strives to generate natural language sentences in order to describe the salient parts of a given image. The main challenge with the existing approaches is effectively extracting image features to generate adequate image captions. Further, there is a need to improve the generalizability of the results on large and diverse datasets. In the current paper, a novel method, namely Next-LSTM is proposed for image captioning. It first extracts the image features using ResNeXt. It is a powerful convolution neural network based model that is adopted for the first time in the image captioning domain. Later, it applies a Long-short term memory network on the extracted features to generate accurate captions for the images. The proposed framework is then evaluated on the benchmark Flickr-8k dataset on Accuracy and BLEU Score. The performance of the proposed framework is also compared to the state-of-the-art approaches, and it outperforms the existing approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review on the long short-term memory model

Article 13 May 2020

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

CLIP-Adapter: Better Vision-Language Models with Feature Adapters

Article 15 September 2023

References

Alam MS, Narula V, Haldia R, Nikam Ganpatrao G (2021) An empirical study of image captioning using deep learning. In: 2021 5th international conference on trends in electronics and informatics (ICOEI), Tirunelveli, India, 2021, pp 1039–1044. https://doi.org/10.1109/ICOEI51242.2021.9452919
Al-Jamal Z, Bani-Amer MJ, Aljawarneh S (2022) Image captioning techniques: a review. In: 2022 international conference on engineering & MIS (ICEMIS), Istanbul, Turkey, pp 1–5. https://doi.org/10.1109/ICEMIS56295.2022.9914173
Amirian S, Rasheed K, Taha TR, Arabnia HR (2020) Automatic image and video caption generation with deep learning: a concise review and algorithmic overlap. IEEE Access 8:218386–218400. https://doi.org/10.1109/ACCESS.2020.3042484
Article Google Scholar
Cao P, Yang Z, Sun L et al (2019) Image captioning with bidirectional semantic attention-based guiding of long short-term memory. Neural Process Lett 50:103–119. https://doi.org/10.1007/s11063-018-09973-5
Article Google Scholar
Deng Z, Jiang Z, Lan R, Huang W, Luo X (2020) Image captioning using DenseNet network and adaptive attention. Signal Process Image Commun 85:115836. https://doi.org/10.1016/j.image.2020.115836. (ISSN 0923-5965)
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA, pp 770–778. https://doi.org/10.1109/CVPR.2016.90
Hodosh M, Young P, Hockenmaier J (2013) Framing image description as a ranking task: data, models and evaluation metrics. J Artif Intell Res 47:853–899. https://doi.org/10.1613/jair.3994
Article MathSciNet MATH Google Scholar
Kalra S, Leekha A (2020) Survey of convolutional neural networks for image captioning. J Inf Optim Sci. https://doi.org/10.1080/02522667.2020.1715602
Article Google Scholar
Kanimozhiselvi CS, Karthika V, Kalaivani SP, Krithika S (2022) Image captioning using deep learning. In: 2022 international conference on computer communication and informatics (ICCCI), Coimbatore, India, pp 1–7. https://doi.org/10.1109/ICCCI54379.2022.9740788
Malhotra R, Singh P (2023) Recent advances in deep learning models: a systematic literature review. Multimed Tools Appl. https://doi.org/10.1007/s11042-023-15295-z
Article Google Scholar
Mathur P (2021) A survey on various deep learning models for automatic image captioning. J Phys Conf Ser 1950(1):012045
Article Google Scholar
Phukan BB, Panda AR (2021) An efficient technique for image captioning using deep neural network. In: Mallick PK, Bhoi AK, Marques G, de Albuquerque VHC (eds) Cognitive informatics and soft computing. Advances in intelligent systems and computing, vol 1317. Springer, Singapore. https://doi.org/10.1007/978-981-16-1056-1_38
Chapter Google Scholar
Rage K (2022) A study on different deep learning architectures on image captioning. In: 2022 8th international conference on smart structures and systems (ICSSS), Chennai, India, pp 1–9. https://doi.org/10.1109/ICSSS54381.2022.9782260
Rao S, Santhosh S, Preethi Salian K, Chidananda T, Prathyakshini, Sandeep Kumar S (2022) A novel approach to generate the captions for images with deep learning using CNN and LSTM model. In: 2022 international conference on distributed computing, VLSI, electrical circuits and robotics (DISCOVER), Shivamogga, India, pp 176–179. https://doi.org/10.1109/DISCOVER55800.2022.9974750
Sharma H, Jalal AH (2020) Incorporating external knowledge for image captioning using CNN and LSTM. Mod Phys Lett B 34(28):2050315
Article MathSciNet Google Scholar
Shinde S, Hatzade D, Unhale S, Marwal G (2022) Analysis of different feature extractors for image captioning using deep learning. In: 2022 3rd international conference for emerging technology (INCET), Belgaum, India, pp 1–5. https://doi.org/10.1109/INCET54531.2022.9824294
Singh P, Gupta P, Jain H (2022) A comparative study of machine learning based image captioning models. In: 2022 6th international conference on trends in electronics and informatics (ICOEI), Tirunelveli, India, 2022, pp 1555–1560. https://doi.org/10.1109/ICOEI53556.2022.9777153
Sudhakar J, Iyer VV, Sharmila ST (2022) Image caption generation using deep neural networks. In: 2022 international conference for advancement in technology (ICONAT), Goa, India, pp 1–3. https://doi.org/10.1109/ICONAT53423.2022.9726074
Xie S, Girshick RB, Dollár P, Tu Z, He K (2016) Aggregated residual transformations for deep neural networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 5987–5995

Download references

Funding

The authors did not receive funding from any organization for the submitted work.

Author information

Priya Singh, Chandan Kumar and Ayush Kumar have contributed equally to this work.

Authors and Affiliations

Department of Software Engineering, Delhi Technological University, Shahbad Daulatpur, Delhi, 110042, India
Priya Singh, Chandan Kumar & Ayush Kumar

Authors

Priya Singh
View author publications
You can also search for this author in PubMed Google Scholar
Chandan Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Ayush Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Priya Singh.

Ethics declarations

Conflict of interest

The authors confirm that there are no known conflicts of interest associated with this publication and there has been no financial gains for this work that could have influenced its outcome.

Human and/or animals participants

None of the authors conducted any experiments with human participants or animals for this paper.

Informed consent

None of the authors conducted any investigations involving human subjects or animals for this research work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Singh, P., Kumar, C. & Kumar, A. Next-LSTM: a novel LSTM-based image captioning technique. Int J Syst Assur Eng Manag 14, 1492–1503 (2023). https://doi.org/10.1007/s13198-023-01956-7

Download citation

Received: 04 April 2023
Revised: 20 May 2023
Accepted: 22 May 2023
Published: 21 June 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s13198-023-01956-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Next-LSTM: a novel LSTM-based image captioning technique

Abstract

Access this article

Similar content being viewed by others

A review on the long short-term memory model

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

CLIP-Adapter: Better Vision-Language Models with Feature Adapters

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and/or animals participants

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Next-LSTM: a novel LSTM-based image captioning technique

Abstract

Access this article

Similar content being viewed by others

A review on the long short-term memory model

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

CLIP-Adapter: Better Vision-Language Models with Feature Adapters

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and/or animals participants

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation