Abstract
Image retrieval and categorization may need to consider several types of visual features and spatial information between them (e.g., different point of views of an image). This paper presents a novel approach that exploits an extension of the language modeling approach from information retrieval to the problem of graph-based image retrieval and categorization. Such versatile graph model is needed to represent the multiple points of views of images. A language model is defined on such graphs to handle a fast graph matching. We present the experiments achieved with several instances of the proposed model on two collections of images: one composed of 3,849 touristic images and another composed of 3,633 images captured by a mobile robot. Experimental results show that using visual graph model (VGM) improves the accuracies of the results of the standard language model (LM) and outperforms the Support Vector Machine (SVM) method.
Similar content being viewed by others
References
Boutell MR, Luo J, Brown CM (2007) Scene parsing using region-based generative models. IEEE Trans Multimedia 9(1):136–146
Chang Y, Ann H, Yeh W (2000) A unique-id-based matrix strategy for efficient iconic indexing of symbolic pictures. Pattern Recogn 33(8):1263–1276
Chua TS, Tan KL, Ooi BC (1997) Fast signature-based color-spatial image retrieval. In: ICMCS 1997, pp 362–369
Datta R, Joshi D, Li J, Wang JZ (2008) Image retrieval: ideas, influences, and trends of the new age. ACM Comput Surv 40(2):1–60
Egenhofer M, Herring J (1991) Categorizing binary topological relationships between regions, lines and points in geographic databases. In: A framework for the definition of topological relationships and an approach to spatial reasoning within this framework. Santa Barbara, CA
Felzenszwalb PF, Huttenlocher DP (2004) Efficient graph-based image segmentation. Int J Comput Vis 59(2):167–181
Gao S, Wang DH, Lee CH (2006) Automatic image annotation through multi-topic text categorization. In: Proc. of ICASSP 2006, pp 377–380
Han D, Li W, Li Z (2008) Semantic image classification using statistical local spatial relations model. Multimedia Tools and Applications 39(2):169–188
Hironobu YM, Takahashi H, Oka R (1999) Image-to-word transformation based on dividing and vector quantizing images with words. In: Neural networks, pp 405–409
Jeon J, Lavrenko V, Manmatha R (2003) Automatic image annotation and retrieval using cross-media relevance models. In: SIGIR ’03, pp 119–126
Li J, Wang JZ (2003) Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE PAMI 25(9):1075–1088
Lim J, Li Y, You Y, Chevallet J (2007) Scene recognition with camera phones for tourist information access. In: ICME’07
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2) 91–110
Maisonnasse L, Gaussier E, Chevallet J (2007) Revisiting the dependence language model for information retrieval. In: SIGIR ’07
Maisonnasse L, Gaussier E, Chevalet J (2009) Model fusion in conceptual language modeling. In: ECIR ’09, pp 240–251
Manning CD, Raghavan P, Schtze H (2009) Language models for information retrieval. In: An introduction to information retrieval. Cambridge University Press, pp 237–252
Mulhem P, Debanne E (2006) A framework for mixed symbolic-based and feature-based query by example image retrieval. Int J Inf Technol 12(1):74–98
Ounis I, Pasca M (1998) Relief: combining expressiveness and rapidity into a single system. In: SIGIR ’98, pp 266–274
Papadopoulos G, Mezaris V, Kompatsiaris I, Strintzis MG (2007) Combining global and local information for knowledge-assisted image analysis and classification. EURASIP Journal on Advances in Signal Processing, Special Issue on Knowledge-Assisted Media Analysis for Interactive Multimedia Applications 2007
Pham TT, Maisonnasse L, Mulhem P (2009) Visual language modeling for mobile localization: Lig participation in Robotvision’09. In: CLEF working notes 2009. Corfu, Greece
Pham TT, Maisonnasse L, Mulhem P, Gaussier E (2010) Integration of spatial relationship in visual language model for scene retrieval. In: 8th IEEE int. workshop on content-based multimedia indexing
Pham TT, Mulhem P, Maisonnasse L (2010) Spatial relationships in visual graph modeling for image categorization. In: ACM SIGIR’10. Geneva, Switzerland
Pham TV, Smeulders AWM (2006) Learning spatial relations in object recognition. Pattern Recogn Lett 27(14):1673–1684
Ponte JM, Croft WB (1998) A language modeling approach to information retrieval. In: SIGIR ’98
Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: Proceedings of the international conference on computer vision, vol 2, pp 1470–1477
Smeulders AWM, Worring M, Santini S, Gupta A, Jain R (2000) Content based image retrieval at the end of the early years. IEEE PAMI 22(12):1349–1380
Smith JR, Chang S-F (1996) Visualseek: a fully automated content-based image query system. In: Proceedings ACM MM, pp 87–98
Song F, Croft WB (1999) General language model for information retrieval. In: CIKM’99, pp 316–321
Tirilly P, Claveau V, Gros P (2008) Language modeling for bag-of-visual words image categorization. In: Proc. of CIVR 2008, pp 249–258
Won CS, Park DK, Park SJ (2002) Efficient use of mpeg-7 edge histogram descriptor. ETRI J 24(1)
Wu L, Li M, Li Z, Ma WY, Yu N (2007) Visual language modeling for image classification. In: MIR ’07. ACM, New York, pp 115–124
Zhai C, Lafferty J (2001) A study of smoothing methods for language models applied to ad-hoc information retrieval. In: SIGIR ’01, pp 334–342
Acknowledgements
This work was supported by the French National Agency of Research (ANR-06-MDCA-002). Pham Trong-Ton would like to thank Merlion programme of the French Embassy in Singapore for their supports during his Ph.D study.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Pham, TT., Mulhem, P., Maisonnasse, L. et al. Visual graph modeling for scene recognition and mobile robot localization. Multimed Tools Appl 60, 419–441 (2012). https://doi.org/10.1007/s11042-010-0598-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-010-0598-8