Skip to main content
Log in

A probabilistic semantic model for image annotation and multi-modal image retrieval

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

This paper addresses automatic image annotation problem and its application to multi-modal image retrieval. The contribution of our work is three-fold. (1) We propose a probabilistic semantic model in which the visual features and the textual words are connected via a hidden layer which constitutes the semantic concepts to be discovered to explicitly exploit the synergy among the modalities. (2) The association of visual features and textual words is determined in a Bayesian framework such that the confidence of the association can be provided. (3) Extensive evaluation on a large-scale, visually and semantically diverse image collection crawled from Web is reported to evaluate the prototype system based on the model. In the proposed probabilistic model, a hidden concept layer which connects the visual feature and the word layer is discovered by fitting a generative model to the training image and annotation words through an Expectation-Maximization (EM) based iterative learning procedure. The evaluation of the prototype system on 17,000 images and 7736 automatically extracted annotation words from crawled Web pages for multi-modal image retrieval has indicated that the proposed semantic model and the developed Bayesian framework are superior to a state-of-the-art peer system in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Barnard, K., Duygulu, P., de Freitas, N., Forsyth, D., Blei, D.,Jordan, M.I.: Matching words and pictures. J. Machine Learn.Res. 3, 1107–1135 (2003)

    Article  MATH  Google Scholar 

  2. Blei, D., Jordan, M.: Modeling annotated data. In: the 26th International Conference on Research and Development in Information Retrieval (SIGIR) (2003)

  3. Blei, D., Ng, A., Jordan, M.: Dirichlet allocation models. In:The International Conference on Neural Information Processing Systems (2001)

  4. Cai, D., Yu, S., Wen J.-R., Ma W.-Y.: Vips: a vision-based page segmentation algorithm. Microsoft Technical Report (MSR-TR-2003-79) (2003)

  5. Chang, E., Goh, K., Sychay, G., Wu, G.: Cbsa: content-based soft annotation for multimodal image retrieval using bayes point machines. IEEE Trans. Circuits Syst. Video Technol. 13(1) (2003)

  6. Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the em algorithm. J. Royal Stat. Soc. 39(1), 1–38 (1977)

    MATH  MathSciNet  Google Scholar 

  7. Dillon, W.R., Goldstein, M.: Multivariate Analysis, Mehtods and Applications. Wiley, New York (1984)

    Google Scholar 

  8. Duygulu, P., Barnard, K., de Freitas, N., Forsyth, D.A.: Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In: The 7th European Conference on Computer Vision, vol. IV, pp. 97–112, Copenhagan, Denmark (2002)

  9. Feng, S.L., Manmatha, R., Lavrenko, V.: Multiple bernoulli relevance models for image and video annotation. In: The International Conference on Computer Vision and Pattern Recognition, Washington, DC (2004)

  10. Fishman, G.: Monte Carlo Concepts, Algorithms and Applications. Springer-Verlag, Berlin (1996)

    MATH  Google Scholar 

  11. Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Machine Learn.42, 177–196 (2001)

    Article  MATH  Google Scholar 

  12. Hofmann, T., Puzicha, J.: Statistical models for co-occurrence data. AI Memo 1625 (1998)

  13. Hofmann, T., Puzicha, J., Jordan, M.I.: Unsupervised learning from dyadic data. In: The International Conference on Neural Information Processing Systems (1996)

  14. Lavrenko, V., Manmatha, R., Jeon, J.: A model for learning the semantics of pictures. In: the International Conference on Neural Information Processing Systems (NIPS'03) (2003)

  15. Li, J., Wang, J.Z.: Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Trans. PAMI 25(9) (2003)

  16. Mclachlan, G., Basford, K.E.: Mixture Models. Marcel-Dekker, Basel, NY (1988)

    MATH  Google Scholar 

  17. Mori, Y., Takahashi, H., Oka, R.: Image-to-word transformation based on difiding and vector quantizing images with words. In: The First International Workshop on Multimedia Intelligent Storage and Retrieval Management (1999)

  18. Rissanen, J.: Stochastic Complexity in Statistical Inquiry. World Scientific, Singapore, (1989)

    MATH  Google Scholar 

  19. Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Machine Intell. 22, 1349–1380 (2000)

    Article  Google Scholar 

  20. Wang, X.-J., Ma, W.-Y., Xue, G.-R., Li, X.: Multi-model similarity propagation and its application for web image retrieval. In: The 12th annual ACM international conference on Multimedia, pp. 944–951, New York City, NY, (2004)

  21. Westerveld, T., de Vries, A.P.: Experimental evaluation of a generative probabilistic image retrieval model on ‘easy’ data. In: The SIGIR Multimedia Information Retrieval Workshop 2003,vAugust (2003)

  22. Zhang, Z.M., Zhang, R., Ohya, J.: Exploiting the cognitive synergy between different media modalities in multimodal information retrieval. In: The IEEE International Conference on Multimedia and Expo (ICME'04), Taipei, Taiwan, (2004)

  23. Zhao, R., Grosky, W.I.: Narrowing the semantic gap – improved text-based web document retrieval using visual features. IEEE Trans. Multimedia 4(2) (2002)

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, R., Zhang, Z., Li, M. et al. A probabilistic semantic model for image annotation and multi-modal image retrieval. Multimedia Systems 12, 27–33 (2006). https://doi.org/10.1007/s00530-006-0025-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-006-0025-1

Keywords

Navigation