The Visual Extent of an Object

Uijlings, J. R. R.; Smeulders, A. W. M.; Scha, R. J. H.

doi:10.1007/s11263-011-0443-1

The Visual Extent of an Object

Suppose We Know the Object Locations

Open access
Published: 10 May 2011

Volume 96, pages 46–63, (2012)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Computer Vision Aims and scope Submit manuscript

The Visual Extent of an Object

Download PDF

J. R. R. Uijlings¹,
A. W. M. Smeulders¹ &
R. J. H. Scha²

1547 Accesses
18 Citations
6 Altmetric
Explore all metrics

Abstract

The visual extent of an object reaches beyond the object itself. This is a long standing fact in psychology and is reflected in image retrieval techniques which aggregate statistics from the whole image in order to identify the object within. However, it is unclear to what degree and how the visual extent of an object affects classification performance. In this paper we investigate the visual extent of an object on the Pascal VOC dataset using a Bag-of-Words implementation with (colour) SIFT descriptors.

Our analysis is performed from two angles. (a) Not knowing the object location, we determine where in the image the support for object classification resides. We call this the normal situation. (b) Assuming that the object location is known, we evaluate the relative potential of the object and its surround, and of the object border and object interior. We call this the ideal situation. Our most important discoveries are: (i) Surroundings can adequately distinguish between groups of classes: furniture, animals, and land-vehicles. For distinguishing categories within one group the surroundings become a source of confusion. (ii) The physically rigid plane, bike, bus, car, and train classes are recognised by interior boundaries and shape, not by texture. The non-rigid animals dog, cat, cow, and sheep are recognised primarily by texture, i.e. fur, as their projected shape varies greatly. (iii) We confirm an early observation from human psychology (Biederman in Perceptual Organization, pp. 213–263, 1981): in the ideal situation with known object locations, recognition is no longer improved by considering surroundings. In contrast, in the normal situation with unknown object locations, the surroundings significantly contribute to the recognition of most classes.

Article PDF

Bag-of-Words Image Representation: Key Ideas and Further Insight

A Survey of Landmark Recognition Using the Bag-of-Words Framework

Adding Color Information to Spatially-Enhanced, Bag-of-Visual-Words Models

References

Agarwal, S., Awan, A., & Roth, D. (2004). Learning to detect objects in images via a sparse, part-based representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(11), 1475–1490.
Article Google Scholar
Bar, M. (2004). Visual objects in context. Nature Reviews. Neuroscience, 5, 617–629.
Article Google Scholar
Biederman, I. (1981). On the semantics of a glance at a scene. In Perceptual organization (pp. 213–263). Hillsdale: Lawrence Erlbaum.
Google Scholar
Bishop, C. M. (2006). Pattern recognition and machine intelligence. Berlin: Springer.
Google Scholar
Blaschko, M. B., & Lampert, C. H. (2009). Object localization with global and local context kernels. In British machine vision conference.
Google Scholar
Burl, M. C., Weber, M., & Perona, P. (1998). A probabilistic approach to object recognition using local photometry and global geometry. In European conference on computer vision.
Google Scholar
Carbonetto, P., de Freitas, N., & Barnard, K. (2004). A statistical model for general contextual object recognition. In European conference on computer vision. Berlin: Springer.
Google Scholar
Csurka, G., Dance, C. R., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In ECCV international workshop on statistical learning in computer vision, Prague.
Google Scholar
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In IEEE conference on computer vision and pattern recognition.
Google Scholar
Divvala, S. K., Hoiem, D., Hays, J. H., Efros, A. A., & Herbert, M. (2009). An empirical study of context in object detection. In IEEE conference on computer vision and pattern recognition.
Google Scholar
Everingham, M., van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88, 303–338.
Article Google Scholar
Fergus, R., Perona, P., & Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In IEEE conference on computer vision and pattern recognition.
Google Scholar
Fulkerson, B., Vedaldi, A., & Soatto, S. (2009). Class segmentation and object localization with superpixel neighborhoods. In IEEE international conference on computer vision.
Google Scholar
Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1), 3–42.
Article MATH Google Scholar
Gould, S., Fulton, R., & Koller, D. (2009). Decomposing a scene into geometric and semantically consistent regions. In IEEE international conference on computer vision.
Google Scholar
Harzallah, H., Jurie, F., & Schmid, C. (2009). Combining efficient object localization and image classification. In IEEE international conference on computer vision.
Google Scholar
Hoiem, D., Efros, A. A., & Hebert, M. (2008). Putting objects in perspective. International Journal of Computer Vision, 80, 3–15.
Article Google Scholar
Jiang, Y. G., Ngo, C. W., & Yang, J. (2007). Towards optimal bag-of-features for object categorization and semantic video retrieval. In ACM international conference on image and video retrieval (pp. 494–501). New York: ACM Press.
Google Scholar
Jurie, F., & Triggs, B. (2005). Creating efficient codebooks for visual recognition. In IEEE international conference on computer vision.
Google Scholar
Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In IEEE conference on computer vision and pattern recognition, New York.
Google Scholar
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 91–110.
Article Google Scholar
Maji, S., Berg, A. C., & Malik, J. (2008). Classification using intersection kernel support vector machines is efficient. In IEEE conference on computer vision and pattern recognition.
Google Scholar
Malisiewicz, T., & Efros, A. A. (2007). Improving spatial support for objects via multiple segmentations. In British machine vision conference, September 2007.
Google Scholar
Malisiewicz, T., & Efros, A. A. (2009). Beyond categories: the visual memex model for reasoning about object relationships. In Neural information processing systems.
Google Scholar
Marszałek, M., Schmid, C., Harzallah, H., & van de Weijer, J. (2007). Learning representations for visual object class recognition. In ICCV Pascal VOC 2007 challenge workshop.
Google Scholar
Mikolajczyk, K., & Schmid, C. (2005). A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10), 1615–1630.
Article Google Scholar
Moosmann, F., Triggs, B., & Jurie, F. (2006). Fast discriminative visual codebooks using randomized clustering forests. In Neural information processing systems (pp. 985–992).
Google Scholar
Nedović, V., & Smeulders, A. W. M. (2010). Stages as models of scene geometry. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 1673–1687.
Article Google Scholar
Nowak, E., Jurie, F., & Triggs, B. (2006). Sampling strategies for bag-of-features image classification. In European conference on computer vision.
Google Scholar
Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175.
Article MATH Google Scholar
Oliva, A., & Torralba, A. (2007). The role of context in object recognition. Trends in Cognitive Sciences, 11, 520–527.
Article Google Scholar
Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., & Belongie, S. (2007). Objects in context. In International conference on computer vision (pp. 1–8).
Google Scholar
Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2009). Textonboost for image understanding: multi-class object recognition and segmentation by jointly modeling texture, layout, and context. International Journal of Computer Vision, 81, 2–23.
Article Google Scholar
Singhal, A., Luo, J., & Zhu, W. (2003). Probabilistic spatial context models for scene content understanding. In IEEE conference on computer vision and pattern recognition.
Google Scholar
Sivic, J., & Zisserman, A. (2003). Video Google: a text retrieval approach to object matching in videos. In IEEE international conference on computer vision.
Google Scholar
Smeaton, A. F., Over, P. & Kraaij, W. (2006). Evaluation campaigns and TRECVID. In ACM SIGMM international workshop on multimedia information Retrieval.
Google Scholar
Smeulders, A. W. M., Worring, M., Santini, S., Gupta, A., & Jain, R. (2000). Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12), 1349–1380.
Article Google Scholar
Tahir, M. A., van de Sande, K., Uijlings, J., Yan, F., Li, X., Mikolajczyk, K., Kittler, J., Gevers, T., & Smeulders, A. (2008). UVA and surrey @ Pascal VOC 2008. In ECCV Pascal VOC 2008 challenge workshop.
Google Scholar
Tuytelaars, T., & Schmid, C. (2007). Vector quantizing feature space with a regular lattice. In IEEE international conference on computer vision.
Google Scholar
Uijlings, J. R. R., Smeulders, A. W. M., & Scha, R. J. H. (2009). What is the spatial extent of an object? In IEEE conference on computer vision and pattern recognition.
Google Scholar
Uijlings, J. R. R., Smeulders, A. W. M., & Scha, R. J. H. (2010, in press). Real-time visual concept classification. IEEE Transactions on Multimedia. http://dx.doi.org/10.1109/TMM.2010.2052027
Ullah, M. M., Parizi, S. N., & Laptev, I. (2010). Improving bag-of-features action recognition with non-local cues. In British machine vision conference.
Google Scholar
van de Sande, K. E. A., Gevers, T., & Snoek, C. G. M. (2010). Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 1582–1596.
Article Google Scholar
Vedaldi, A., & Zisserman, A. (2010). Efficient additive kernels via explicit feature maps. In IEEE conference on computer vision and pattern recognition.
Google Scholar
Wolf, L., & Bileschi, S. (2006). A critical view of context. International Journal of Computer Vision, 69, 251–261.
Article Google Scholar
Zhang, J., Marszałek, M., Lazebnik, S., & Schmid, C. (2007). Local features and Kernels for classification of texture and object categories: a comprehensive study. International Journal of Computer Vision, 73(2), 213–238.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Informatics, ISIS Lab, Science Park 107, 1098 XG, Amsterdam, The Netherlands
J. R. R. Uijlings & A. W. M. Smeulders
Institute for Logic, Language and Computation, Amsterdam, The Netherlands
R. J. H. Scha

Authors

J. R. R. Uijlings
View author publications
You can also search for this author in PubMed Google Scholar
A. W. M. Smeulders
View author publications
You can also search for this author in PubMed Google Scholar
R. J. H. Scha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to J. R. R. Uijlings.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

(156 KB)

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and permissions

About this article

Cite this article

Uijlings, J.R.R., Smeulders, A.W.M. & Scha, R.J.H. The Visual Extent of an Object. Int J Comput Vis 96, 46–63 (2012). https://doi.org/10.1007/s11263-011-0443-1

Download citation

Received: 27 July 2010
Accepted: 28 March 2011
Published: 10 May 2011
Issue Date: January 2012
DOI: https://doi.org/10.1007/s11263-011-0443-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

The Visual Extent of an Object

Abstract

Article PDF

Similar content being viewed by others

Bag-of-Words Image Representation: Key Ideas and Further Insight

A Survey of Landmark Recognition Using the Bag-of-Words Framework

Adding Color Information to Spatially-Enhanced, Bag-of-Visual-Words Models

References

Author information

Authors and Affiliations

Corresponding author

Electronic Supplementary Material

(156 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The Visual Extent of an Object

Abstract

Article PDF

Similar content being viewed by others

Bag-of-Words Image Representation: Key Ideas and Further Insight

A Survey of Landmark Recognition Using the Bag-of-Words Framework

Adding Color Information to Spatially-Enhanced, Bag-of-Visual-Words Models

References

Author information

Authors and Affiliations

Corresponding author

Electronic Supplementary Material

(156 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation