Skip to main content
Log in

Visual graph modeling for scene recognition and mobile robot localization

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Image retrieval and categorization may need to consider several types of visual features and spatial information between them (e.g., different point of views of an image). This paper presents a novel approach that exploits an extension of the language modeling approach from information retrieval to the problem of graph-based image retrieval and categorization. Such versatile graph model is needed to represent the multiple points of views of images. A language model is defined on such graphs to handle a fast graph matching. We present the experiments achieved with several instances of the proposed model on two collections of images: one composed of 3,849 touristic images and another composed of 3,633 images captured by a mobile robot. Experimental results show that using visual graph model (VGM) improves the accuracies of the results of the standard language model (LM) and outperforms the Support Vector Machine (SVM) method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. http://imageclef.org/2009/robot

  2. http://www.csie.ntu.edu.tw/cjlin/libsvm/

  3. http://ltilib.sourceforge.net/

References

  1. Boutell MR, Luo J, Brown CM (2007) Scene parsing using region-based generative models. IEEE Trans Multimedia 9(1):136–146

    Article  Google Scholar 

  2. Chang Y, Ann H, Yeh W (2000) A unique-id-based matrix strategy for efficient iconic indexing of symbolic pictures. Pattern Recogn 33(8):1263–1276

    Article  Google Scholar 

  3. Chua TS, Tan KL, Ooi BC (1997) Fast signature-based color-spatial image retrieval. In: ICMCS 1997, pp 362–369

  4. Datta R, Joshi D, Li J, Wang JZ (2008) Image retrieval: ideas, influences, and trends of the new age. ACM Comput Surv 40(2):1–60

    Article  Google Scholar 

  5. Egenhofer M, Herring J (1991) Categorizing binary topological relationships between regions, lines and points in geographic databases. In: A framework for the definition of topological relationships and an approach to spatial reasoning within this framework. Santa Barbara, CA

  6. Felzenszwalb PF, Huttenlocher DP (2004) Efficient graph-based image segmentation. Int J Comput Vis 59(2):167–181

    Article  Google Scholar 

  7. Gao S, Wang DH, Lee CH (2006) Automatic image annotation through multi-topic text categorization. In: Proc. of ICASSP 2006, pp 377–380

  8. Han D, Li W, Li Z (2008) Semantic image classification using statistical local spatial relations model. Multimedia Tools and Applications 39(2):169–188

    Article  Google Scholar 

  9. Hironobu YM, Takahashi H, Oka R (1999) Image-to-word transformation based on dividing and vector quantizing images with words. In: Neural networks, pp 405–409

  10. Jeon J, Lavrenko V, Manmatha R (2003) Automatic image annotation and retrieval using cross-media relevance models. In: SIGIR ’03, pp 119–126

  11. Li J, Wang JZ (2003) Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE PAMI 25(9):1075–1088

    Article  Google Scholar 

  12. Lim J, Li Y, You Y, Chevallet J (2007) Scene recognition with camera phones for tourist information access. In: ICME’07

  13. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2) 91–110

    Article  Google Scholar 

  14. Maisonnasse L, Gaussier E, Chevallet J (2007) Revisiting the dependence language model for information retrieval. In: SIGIR ’07

  15. Maisonnasse L, Gaussier E, Chevalet J (2009) Model fusion in conceptual language modeling. In: ECIR ’09, pp 240–251

  16. Manning CD, Raghavan P, Schtze H (2009) Language models for information retrieval. In: An introduction to information retrieval. Cambridge University Press, pp 237–252

  17. Mulhem P, Debanne E (2006) A framework for mixed symbolic-based and feature-based query by example image retrieval. Int J Inf Technol 12(1):74–98

    Google Scholar 

  18. Ounis I, Pasca M (1998) Relief: combining expressiveness and rapidity into a single system. In: SIGIR ’98, pp 266–274

  19. Papadopoulos G, Mezaris V, Kompatsiaris I, Strintzis MG (2007) Combining global and local information for knowledge-assisted image analysis and classification. EURASIP Journal on Advances in Signal Processing, Special Issue on Knowledge-Assisted Media Analysis for Interactive Multimedia Applications 2007

  20. Pham TT, Maisonnasse L, Mulhem P (2009) Visual language modeling for mobile localization: Lig participation in Robotvision’09. In: CLEF working notes 2009. Corfu, Greece

  21. Pham TT, Maisonnasse L, Mulhem P, Gaussier E (2010) Integration of spatial relationship in visual language model for scene retrieval. In: 8th IEEE int. workshop on content-based multimedia indexing

  22. Pham TT, Mulhem P, Maisonnasse L (2010) Spatial relationships in visual graph modeling for image categorization. In: ACM SIGIR’10. Geneva, Switzerland

  23. Pham TV, Smeulders AWM (2006) Learning spatial relations in object recognition. Pattern Recogn Lett 27(14):1673–1684

    Article  Google Scholar 

  24. Ponte JM, Croft WB (1998) A language modeling approach to information retrieval. In: SIGIR ’98

  25. Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: Proceedings of the international conference on computer vision, vol 2, pp 1470–1477

  26. Smeulders AWM, Worring M, Santini S, Gupta A, Jain R (2000) Content based image retrieval at the end of the early years. IEEE PAMI 22(12):1349–1380

    Article  Google Scholar 

  27. Smith JR, Chang S-F (1996) Visualseek: a fully automated content-based image query system. In: Proceedings ACM MM, pp 87–98

  28. Song F, Croft WB (1999) General language model for information retrieval. In: CIKM’99, pp 316–321

  29. Tirilly P, Claveau V, Gros P (2008) Language modeling for bag-of-visual words image categorization. In: Proc. of CIVR 2008, pp 249–258

  30. Won CS, Park DK, Park SJ (2002) Efficient use of mpeg-7 edge histogram descriptor. ETRI J 24(1)

  31. Wu L, Li M, Li Z, Ma WY, Yu N (2007) Visual language modeling for image classification. In: MIR ’07. ACM, New York, pp 115–124

    Chapter  Google Scholar 

  32. Zhai C, Lafferty J (2001) A study of smoothing methods for language models applied to ad-hoc information retrieval. In: SIGIR ’01, pp 334–342

Download references

Acknowledgements

This work was supported by the French National Agency of Research (ANR-06-MDCA-002). Pham Trong-Ton would like to thank Merlion programme of the French Embassy in Singapore for their supports during his Ph.D study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Trong-Ton Pham.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pham, TT., Mulhem, P., Maisonnasse, L. et al. Visual graph modeling for scene recognition and mobile robot localization. Multimed Tools Appl 60, 419–441 (2012). https://doi.org/10.1007/s11042-010-0598-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-010-0598-8

Keywords

Navigation