A probabilistic semantic model for image annotation and multi-modal image retrieval

Zhang, Ruofei; Zhang, Zhongfei (Mark); Li, Mingjing; Ma, Wei-Ying; Zhang, Hong-Jiang

doi:10.1007/s00530-006-0025-1

A probabilistic semantic model for image annotation and multi-modal image retrieval

Regular Paper
Published: 27 April 2006

Volume 12, pages 27–33, (2006)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Ruofei Zhang¹,
Zhongfei (Mark) Zhang¹,
Mingjing Li²,
Wei-Ying Ma² &
…
Hong-Jiang Zhang²

113 Accesses
26 Citations
6 Altmetric
Explore all metrics

Abstract

This paper addresses automatic image annotation problem and its application to multi-modal image retrieval. The contribution of our work is three-fold. (1) We propose a probabilistic semantic model in which the visual features and the textual words are connected via a hidden layer which constitutes the semantic concepts to be discovered to explicitly exploit the synergy among the modalities. (2) The association of visual features and textual words is determined in a Bayesian framework such that the confidence of the association can be provided. (3) Extensive evaluation on a large-scale, visually and semantically diverse image collection crawled from Web is reported to evaluate the prototype system based on the model. In the proposed probabilistic model, a hidden concept layer which connects the visual feature and the word layer is discovered by fitting a generative model to the training image and annotation words through an Expectation-Maximization (EM) based iterative learning procedure. The evaluation of the prototype system on 17,000 images and 7736 automatically extracted annotation words from crawled Web pages for multi-modal image retrieval has indicated that the proposed semantic model and the developed Bayesian framework are superior to a state-of-the-art peer system in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A New Method for Image Understanding and Retrieval Using Text-Mined Knowledge

Visual and semantic context modeling for scene-centric image annotation

Article 06 April 2016

Image Annotation Using a Semantic Hierarchy

References

Barnard, K., Duygulu, P., de Freitas, N., Forsyth, D., Blei, D.,Jordan, M.I.: Matching words and pictures. J. Machine Learn.Res. 3, 1107–1135 (2003)
Article MATH Google Scholar
Blei, D., Jordan, M.: Modeling annotated data. In: the 26th International Conference on Research and Development in Information Retrieval (SIGIR) (2003)
Blei, D., Ng, A., Jordan, M.: Dirichlet allocation models. In:The International Conference on Neural Information Processing Systems (2001)
Cai, D., Yu, S., Wen J.-R., Ma W.-Y.: Vips: a vision-based page segmentation algorithm. Microsoft Technical Report (MSR-TR-2003-79) (2003)
Chang, E., Goh, K., Sychay, G., Wu, G.: Cbsa: content-based soft annotation for multimodal image retrieval using bayes point machines. IEEE Trans. Circuits Syst. Video Technol. 13(1) (2003)
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the em algorithm. J. Royal Stat. Soc. 39(1), 1–38 (1977)
MATH MathSciNet Google Scholar
Dillon, W.R., Goldstein, M.: Multivariate Analysis, Mehtods and Applications. Wiley, New York (1984)
Google Scholar
Duygulu, P., Barnard, K., de Freitas, N., Forsyth, D.A.: Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In: The 7th European Conference on Computer Vision, vol. IV, pp. 97–112, Copenhagan, Denmark (2002)
Feng, S.L., Manmatha, R., Lavrenko, V.: Multiple bernoulli relevance models for image and video annotation. In: The International Conference on Computer Vision and Pattern Recognition, Washington, DC (2004)
Fishman, G.: Monte Carlo Concepts, Algorithms and Applications. Springer-Verlag, Berlin (1996)
MATH Google Scholar
Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Machine Learn.42, 177–196 (2001)
Article MATH Google Scholar
Hofmann, T., Puzicha, J.: Statistical models for co-occurrence data. AI Memo 1625 (1998)
Hofmann, T., Puzicha, J., Jordan, M.I.: Unsupervised learning from dyadic data. In: The International Conference on Neural Information Processing Systems (1996)
Lavrenko, V., Manmatha, R., Jeon, J.: A model for learning the semantics of pictures. In: the International Conference on Neural Information Processing Systems (NIPS'03) (2003)
Li, J., Wang, J.Z.: Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Trans. PAMI 25(9) (2003)
Mclachlan, G., Basford, K.E.: Mixture Models. Marcel-Dekker, Basel, NY (1988)
MATH Google Scholar
Mori, Y., Takahashi, H., Oka, R.: Image-to-word transformation based on difiding and vector quantizing images with words. In: The First International Workshop on Multimedia Intelligent Storage and Retrieval Management (1999)
Rissanen, J.: Stochastic Complexity in Statistical Inquiry. World Scientific, Singapore, (1989)
MATH Google Scholar
Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Machine Intell. 22, 1349–1380 (2000)
Article Google Scholar
Wang, X.-J., Ma, W.-Y., Xue, G.-R., Li, X.: Multi-model similarity propagation and its application for web image retrieval. In: The 12th annual ACM international conference on Multimedia, pp. 944–951, New York City, NY, (2004)
Westerveld, T., de Vries, A.P.: Experimental evaluation of a generative probabilistic image retrieval model on ‘easy’ data. In: The SIGIR Multimedia Information Retrieval Workshop 2003,vAugust (2003)
Zhang, Z.M., Zhang, R., Ohya, J.: Exploiting the cognitive synergy between different media modalities in multimodal information retrieval. In: The IEEE International Conference on Multimedia and Expo (ICME'04), Taipei, Taiwan, (2004)
Zhao, R., Grosky, W.I.: Narrowing the semantic gap – improved text-based web document retrieval using visual features. IEEE Trans. Multimedia 4(2) (2002)

Download references

Author information

Authors and Affiliations

Department of Computer Science, SUNY at Binghamton, Binghamton, NY, 13902, USA
Ruofei Zhang & Zhongfei (Mark) Zhang
Microsoft Research Asia, Beijing, 100080, People's Republic of China
Mingjing Li, Wei-Ying Ma & Hong-Jiang Zhang

Authors

Ruofei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhongfei (Mark) Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Mingjing Li
View author publications
You can also search for this author in PubMed Google Scholar
Wei-Ying Ma
View author publications
You can also search for this author in PubMed Google Scholar
Hong-Jiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, R., Zhang, Z., Li, M. et al. A probabilistic semantic model for image annotation and multi-modal image retrieval. Multimedia Systems 12, 27–33 (2006). https://doi.org/10.1007/s00530-006-0025-1

Download citation

Published: 27 April 2006
Issue Date: August 2006
DOI: https://doi.org/10.1007/s00530-006-0025-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A probabilistic semantic model for image annotation and multi-modal image retrieval

Abstract

Access this article

Similar content being viewed by others

A New Method for Image Understanding and Retrieval Using Text-Mined Knowledge

Visual and semantic context modeling for scene-centric image annotation

Image Annotation Using a Semantic Hierarchy

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A probabilistic semantic model for image annotation and multi-modal image retrieval

Abstract

Access this article

Similar content being viewed by others

A New Method for Image Understanding and Retrieval Using Text-Mined Knowledge

Visual and semantic context modeling for scene-centric image annotation

Image Annotation Using a Semantic Hierarchy

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation