Abstract
Automatic image annotation is an attractive service for users and administrators of online photo sharing websites. In this paper, we propose an image annotation approach exploiting the crossmodal saliency correlation including visual and textual saliency. For textual saliency, a concept graph is firstly established based on the association between the labels. Then semantic communities and latent textual saliency are detected; For visual saliency, we adopt a dual-layer BoW (DL-BoW) model integrated with the local features and salient regions of the image. Experiments on MIRFlickr and IAPR TC-12 datasets demonstrate that the proposed method outperforms other state-of-the-art approaches.
Similar content being viewed by others
Notes
The term ‘community’ comes from research field of networks which is similar to ‘clique’ in graph-cut problems but not identical.
References
Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech 2008(10):P10008
Cao X, Zhang H, Guo X, Liu S, Meng D (2015) Sled: semantic label embedding dictionary representation for multi-label image annotation. IEEE Trans Image Process 24:2746
Carneiro G, Chan A, Moreno P, Vasconcelos N (2007) Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans Pattern Anal Mach Intell 29(3):394–410. doi:10.1109/TPAMI.2007.61
Chapelle O, Haffner P, Vapnik VN (1999) Support vector machines for histogram-based image classification. IEEE Trans Neural Netw 10(5):1055–1064
Cusano C, Ciocca G, Schettini R (2003) Image annotation using SVM. In: Electronic imaging 2004. International Society for Optics and Photonics, pp. 330–338
Fu J, Mei T, Yang K, Lu H, Rui Y (2015) Tagging personal photos with transfer deep learning. In: Proceedings of the 24th international conference on World Wide Web, International World Wide Web Conferences Steering Committee, pp 344–354
Goh KS, Chang EY, Li B (2005) Using one-class and two-class svms for multiclass image annotation. IEEE Trans Knowl Data Eng 17(10):1333–1346
Grubinger M, Clough P, M U Ller H, Deselaers T (2006) The iapr tc-12 benchmark: A new evaluation resource for visual information systems. Int Workshop OntoImage 5:10
Gu Y, Qian X, Li Q, Wang M, Hong R, Tian Q (2015) Image annotation by latent community detection and multikernel learning. IEEE Trans Image Process 24(11):3450–3463. doi:10.1109/TIP.2015.2443501
Gu Y, Xue H, Yang J, Jia Z (2014) Automatic image annotation exploiting textual and visual saliency. In: Neural information processing. Springer, Berlin, pp 95–102
Han Y, Wu F, Tian Q, Zhuang Y (2012) Image annotation by input-output structural grouping sparsity. IEEE Trans Image Process 21(6):3066–3079
Huiskes MJ, Lew MS (2008) The mir flickr retrieval evaluation. In: Proceedings of the 1st ACM international conference on Multimedia information retrieval, ACM, pp 39–43
Li X, Snoek CG, Worring M (2009) Learning social tag relevance by neighbor voting. IEEE Trans Multimed 11(7):1310–1322
Liu D, Hua XS, Yang L, Wang M, Zhang HJ (2009) Tag ranking. In: Proceedings of the 18th international conference on World wide web, ACM, pp. 351–360
Liu X, Cheng B, Yan S, Tang J, Chua TS, Jin H (2009) Label to region by bi-layer sparsity priors. In: Proceedings of the 17th ACM international conference on Multimedia, ACM, pp. 115–124
Liu X, Liu R, Li F, Cao Q (2012) Graph-based dimensionality reduction for knn-based image annotation. In: ICPR, pp. 1253–1256
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Lu Z, Wang L (2015) Learning descriptive visual representation for image classification and annotation. Pattern Recognit 48(2):498–508
Ma Z, Nie F, Yang Y, Uijlings JR, Sebe N (2012) Web image annotation via subspace-sparsity collaborated feature selection. IEEE Trans Multimed 14(4):1021–1030
Makadia A, Pavlovic V, Kumar S (2010) Baselines for image annotation. Int J Comput Vis 90(1):88–105
Qi X, Han Y (2007) Incorporating multiple SVMs for automatic image annotation. Pattern Recognit 40(2):728–741
Qian X, Hua XS, Tang YY, Mei T (2014) Social image tagging with diverse semantics. IEEE Transactions on Cybern 44(12):2493–2508. doi:10.1109/TCYB.2014.2309593
Qian X, Liu X, Zheng C, Du Y, Hou X (2013) Tagging photos using users’ vocabularies. Neurocomputing 111:144–153
Saito P, de Rezende PJ, Falc A O AX, Suzuki CT, Gomes JF (2013) A data reduction and organization approach for efficient image annotation. In: Proceedings of the 28th annual ACM symposium on applied computing, pp. 53–57
Sonnenburg S, Rätsch G, Schäfer C, Schölkopf B (2006) Large scale multiple kernel learning. J Mach Learn Res 7:1531–1565
Tang J, Hong R, Yan S, Chua TS, Qi GJ, Jain R (2011) Image annotation by k nn-sparse graph-based label propagation over noisily tagged web images. ACM Trans Intell Syst Technol (TIST) 2(2):14
Tang J, Yan S, Zhao C, Chua TS, Jain R (2013) Label-specific training set construction from web resource for image annotation. Signal Process 93(8):2199–2204
Tang J, Zha ZJ, Tao D, Chua TS (2012) Semantic-gap-oriented active learning for multilabel image annotation. IEEE Trans Image Process 21(4):2354–2360
Wu B, Lyu S, Hu BG, Ji Q (2015) Multi-label learning with missing labels for image annotation and facial action unit recognition. Pattern Recognit 48:2279
Yan R, Natsev A, Campbell M (2008) A learning-based hybrid tagging and browsing approach for efficient manual image annotation. In: CVPR, pp. 1–8
Yang C, Zhang L, Lu H, Ruan X, Yang MH (2013) Saliency detection via graph-based manifold ranking. In: Computer vision and pattern recognition, 2013. IEEE Conference on CVPR 2013, pp. 3166–3173
Yang Y, Wu F, Nie F, Shen HT, Zhuang Y, Hauptmann AG (2012) Web and personal image annotation by mining label correlation with relaxed visual graph embedding. IEEE Trans Image Process 21(3):1339–1351
Zhang D, Islam MM, Lu G (2012) A review on automatic image annotation techniques. Pattern Recognit 45(1):346–362
Zhang ML, Zhou ZH (2007) Ml-knn: a lazy learning approach to multi-label learning. Pattern Recognit 40(7):2038–2048
Zhu G, Wang Q, Yuan Y (2014) Tag-saliency: combining bottom-up and top-down information for saliency detection. Comput Vis Image Underst 118:40–49
Acknowledgments
This work is partly supported by NSFC China (No: 61572315), and 973 Plan, China (No. 2015CB856004).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gu, Y., Xue, H. & Yang, J. Cross-Modal Saliency Correlation for Image Annotation. Neural Process Lett 45, 777–789 (2017). https://doi.org/10.1007/s11063-016-9511-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-016-9511-4