Skip to main content
Log in

Multiple level visual semantic fusion method for image re-ranking

  • Special Issue Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Mid-level semantic attributes have obtained some success in image retrieval and re-ranking. However, due to the semantic gap between the low-level feature and intermediate semantic concept, information loss is considerable in the process of converting the low-level feature to semantic concept. To tackle this problem, we tried to bridge the semantic gap by looking for the complementary of different mid-level features. In this paper, a framework is proposed to improve image re-ranking by fusing multiple mid-level features together. The framework contains three mid-level features (DCNN-ImageNet attributes, Fisher vector, sparse coding spatial pyramid matching) and a semi-supervised multigraph-based model that combines these features together. In addition, our framework can be easily extended to utilize arbitrary number of features for image re-ranking. The experiments are conducted on the a-Pascal dataset, and our approach that fuses different features together is able to boost performance of image re-ranking efficiently.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Boureau, Y.L., Bach, F., LeCun, Y., Ponce, J.: Learning mid-level features for recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2559–2566. IEEE, New York (2010)

  2. Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision. European Conference on Computer Vision (ECCV), Vol. 1, pp. 1–2 (2004)

  3. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Vol. 1, pp. 886–893. IEEE, New York (2005)

  4. Douze, M., Ramisa, A., Schmid, C.: Combining attributes and fisher vectors for efficient image retrieval. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 745–752. IEEE, New York (2011)

  5. Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1778–1785. IEEE, New York (2009)

  6. Farquhar, J., Szedmak, S., Meng, H., Shawe-Taylor, J.: Improving” bag-of-keypoints” image categorisation: generative models and pdf-kernels. In: Technical Report, University of Southampton (2005)

  7. Gao, Y., Ji, R., Liu, W., Dai, Q., Hua, G.: Weakly supervised dictionary learning with attributes. In: IEEE Transactions on Image Processing (2014)

  8. Gao, Y., Wang, M., Tao, D., Ji, R., Dai, Q.: 3-D object retrieval and recognition with hypergraph analysis. IEEE Trans. Image Process. 21(9), 4290–4303 (2012)

    Article  MathSciNet  Google Scholar 

  9. Gao, Y., Wang, M., Zha, Z.J., Shen, J., Li, X., Wu, X.: Visual-textual joint relevance learning for tag-based social image search. IEEE Trans. Image Process. 22(1), 363–376 (2013)

    Article  MathSciNet  Google Scholar 

  10. van Gemert, J.C., Geusebroek, J.M., Veenman, C.J., Smeulders, A.W.: Kernel codebooks for scene categorization. In: European Conference on Computer Vision (ECCV), pp. 696–709. Springer, Berlin, Heidelberg (2008)

  11. Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3304–3311. IEEE, New York (2010)

  12. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Conference on Neural Information Processing Systems (NIPS), Vol. 1, p. 4 (2012)

  13. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Vol. 2, pp. 2169–2178. IEEE, New York (2006)

  14. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  15. Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8. IEEE, New York (2007)

  16. Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: European Conference on Computer Vision (ECCV), pp. 143–156. Springer, Berlin, Heidelberg (2010)

  17. Philbin, J., Isard, M., Sivic, J., Zisserman, A.: Descriptor learning for efficient retrieval. In: European Conference on Computer Vision, pp. 677–691. Springer, Berlin, Heidelberg (2010)

  18. Scheirer, W.J., Kumar, N., Belhumeur, P.N., Boult, T.E.: Multi-attribute spaces: calibration for attribute fusion and similarity search. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2933–2940. IEEE, New York (2012)

  19. Siddiquie, B., Feris, R.S., Davis, L.S.: Image ranking and retrieval based on multi-attribute queries. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 801–808. IEEE, New York (2011)

  20. Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: Ninth IEEE International Conference on Computer Vision, pp. 1470–1477. IEEE, New York (2003)

  21. Vaquero, D.A., Feris, R.S., Tran, D., Brown, L., Hampapur, A., Turk, M.: Attribute-based people search in surveillance environments. In: Workshop on Applications of Computer Vision (WACV), pp. 1–8. IEEE, New York (2009)

  22. Wang, F., Qi, S., Gao, G., Zhao, S., Wang, X.: Logo information recognition in large-scale social media data. In: Multimedia Systems, pp. 1–11 (2014)

  23. Wang, M., Hua, X.S., Hong, R., Tang, J., Qi, G.J., Song, Y.: Unified video annotation via multigraph learning. IEEE Trans. Circuits Syst. Video Technol. 19(5), 733–746 (2009)

    Article  Google Scholar 

  24. Wang, Y., Mori, G.: A discriminative latent model of object classes and attributes. In: European Conference on Computer Vision (ECCV), pp. 155–168. Springer, New York (2010)

  25. Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1794–1801. IEEE, New York (2009)

  26. Yang, J., Yu, K., Huang, T.: Efficient highly over-complete sparse coding using a mixture model. In: European Conference on Computer Vision (ECCV), pp. 113–126. Springer, Berlin, Heidelberg (2010)

  27. Yu, F.X., Ji, R., Tsai, M.H., Ye, G., Chang, S.F.: Weak attributes for large-scale image retrieval. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2949–2956. IEEE, New York (2012)

  28. Zhang, C., Wang, S., Liang, C., Liu, J., Huang, Q., Li, H., Tian, Q.: Beyond bag of words: image representation in sub-semantic space. In: Proceedings of the 21st ACM International Conference On Multimedia, pp. 497–500. ACM, New York (2013)

  29. Zhang, H., Zha, Z.J., Yang, Y., Yan, S., Gao, Y., Chua, T.S.: Attribute-augmented semantic hierarchy: towards bridging semantic gap and intention gap in image retrieval. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 33–42. ACM, New York (2013)

  30. Zhang, L., Gao, Y., Hong, C., Feng, Y., Zhu, J., Cai, D.: Feature correlation hypergraph: exploiting high-order potentials for multimodal recognition. IEEE Trans. Cybern. 44(8), 1408–1419 (2013)

    Article  Google Scholar 

  31. Zhang, L., Gao, Y., Ji, R., Dai, Q., Li, X.: Actively learning human gaze shifting paths for photo cropping. IEEE Trans. Image Process. 21(5), 2235–2245 (2014)

    Article  MathSciNet  Google Scholar 

  32. Zhang, L., Gao, Y., Lu, K., Shen, J., Ji, R.: Representative discovery of structure cues for weakly-supervised image segmentation. IEEE Trans. Multimed. 16(2), 470–479 (2014)

    Article  Google Scholar 

  33. Zhang, L., Gao, Y., Xia, Y., Dai, Q., Li, X.: A fine-grained image categorization system by cellet-encoded spatial pyramid modeling. In: IEEE Transactions on Industrial Electronics (2014)

  34. Zhang, L., Gao, Y., Xia, Y., Li, X.: Spatial-aware object-level saliency prediction by learning graphlet hierarchies. In: IEEE Transactions on Image Processing (2014)

  35. Zhang, L., Gao, Y., Zimmermann, R., Tian, Q., Li, X.: Fusion of multi-channel local and global structural cues for photo aesthetics evaluation. IEEE Trans. Image Process. 23(3), 1419–1429 (2014)

    Article  MathSciNet  Google Scholar 

  36. Zhang, L., Han, Y., Yang, Y., Song, M., Yan, S., Tian, Q.: Discovering discriminative graphlets for aerial image categories recognition. IEEE Trans. Image Process. 22(12), 5071–5084 (2013)

    Article  MathSciNet  Google Scholar 

  37. Zhang, L., Song, M., Liu, X., Sun, L., Chen, C., Bu, J.: Recognizing architecture styles by hierarchical sparse coding of blocklets. Inf. Sci. 254, 141–154 (2014)

    Article  Google Scholar 

  38. Zhang, L., Song, M., Zhao, Q., Liu, X., Bu, J., Chen, C.: Probabilistic graphlet transfer for photo cropping. IEEE Trans. Image Process. 22(2), 802–815 (2013)

    Article  MathSciNet  Google Scholar 

  39. Zhang, L., Yang, Y., Gao, Y., Yu, Y., Wang, C., Li, X.: A probabilistic associative model for segmenting weakly-supervised images. IEEE Trans. Image Process. 23(9), 4150–4159 (2014)

    Article  MathSciNet  Google Scholar 

  40. Zhang, S., Yang, M., Wang, X., Lin, Y., Tian, Q.: Semantic-aware co-indexing for image retrieval. In: Proceedings of IEEE International Conference on Computer Vision (2013)

  41. Zhao, S., Gao, Y., Jiang, X., Yao, H., Chua, T.S., Sun, X.: Exploring principles-of-art features for image emotion recognition. In: ACM International Conference on Multimedia (2014)

  42. Zhao, S., Yao, H., Yang, Y., Zhang, Y.: Affective image retrieval via multi-graph learning. In: ACM International Conference on Multimedia (2014)

Download references

Acknowledgments

This work was partially supported by Shenzhen Applied Technology Engineering Laboratory for Internet Multimedia Application under Grants Shenzhen Development and Reform Commission No. 2012720; Public Service Platform of Mobile Internet Application Security Industry under Grants Shenzhen Development and Reform Commission No. 2012720; Research on Key Technology in Developing Mobile Internet Intelligent Terminal Application Middleware under Grants No. JC201104210032A; Research on Key Technology of Vision Based Intelligent Interaction under Grants No. JC201005260112A National Natural Science Foundation of China No. 61402181; Science and Technology Programme of Guangzhou Municipal Government No. 2014J4100006.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xuan Wang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qi, S., Wang, F., Wang, X. et al. Multiple level visual semantic fusion method for image re-ranking. Multimedia Systems 23, 155–167 (2017). https://doi.org/10.1007/s00530-014-0448-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-014-0448-z

Keywords

Navigation