Abstract
Image captioning is a challenging problem in computer vision, which has numerous practical applications. Recently, the method of dense image captioning has emerged, which realizes the full understanding of the image by localizing and describing multiple salient regions covering the image. Despite there are state-of-the-art approaches encouraging progress, the ability to position and to describe the target area correspondingly is not enough as we expect. To alleviate this challenge, a precise feature extraction method (PFE) is proposed in this paper to further enhance the effect of dense image captioning. Our model is evaluated on the Visual Genome dataset. It demonstrated that our method is better than other state-of-the-art methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Mao, J., Xu, W., Yang, Y., Wang, J., Yuille, A.L.: Explain Images with Multimodal Recurrent Neural Networks. arXiv:1410.1090 (2014)
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: Computer Vision and Pattern Recognition, pp. 3156–3164 (2015)
Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Computer Vision, pp. 2048–2057 (2015)
Wu, Q., Shen, C., Liu, L., Dick, A., Hengel, A.: What value do explicit high level concepts have in vision to language problem? In: Computer Vision and Pattern Recognition, pp. 203–212 (2016)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Rennie, S.J., Marcheret, R., Mroueh, Y., Ross, J., Goel, V.: Self-critical sequence training for image captioning. In: Computer Vision and Pattern Recognition, pp. 1179–1195 (2017)
Johnson, J., Karpathy, A., Fei-Fei, L.: Densecap: fully convolutional localization networks for dense captioning. In: Computer Vision and Pattern Recognition, pp. 4565–4574 (2016)
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Yang, L., Tang, K.D., Yang, J., Li, L.: Dense captioning with joint inference and visual context. In: Computer Vision and Pattern Recognition, pp. 1978–1987 (2017)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representation (2015)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: mask R-CNN. In: International Conference on Computer Vision, pp. 2908–2988 (2017)
Krishna, R., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vis. 123(1), 32–73 (2017)
Banerjee, S., Lavie, A.: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: ACL Workshop, pp. 65–72 (2005)
Acknowledgement
This research was supported by 2018GZ0517, 2019YFS0146, 2019YFS0155, which supported by Sichuan Provincial Science and Technology Department, 2018KF003 Supported by State Key Laboratory of ASIC & System. No. 61907009 Supported by National Natural Science Foundation of China, No. 2018A030313802 Supported by Natural Science Foundation of Guangdong Province, No. 2017B010110007 and 2017B010110015 Supported by Science and Technology Planning Project of Guangdong Province.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, Z. et al. (2019). Dense Image Captioning Based on Precise Feature Extraction. In: Gedeon, T., Wong, K., Lee, M. (eds) Neural Information Processing. ICONIP 2019. Communications in Computer and Information Science, vol 1143. Springer, Cham. https://doi.org/10.1007/978-3-030-36802-9_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-36802-9_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36801-2
Online ISBN: 978-3-030-36802-9
eBook Packages: Computer ScienceComputer Science (R0)