Inference and Learning with Hierarchical Shape Models

Kokkinos, Iasonas; Yuille, Alan

doi:10.1007/s11263-010-0398-7

Inference and Learning with Hierarchical Shape Models

Open access
Published: 28 October 2010

Volume 93, pages 201–225, (2011)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Computer Vision Aims and scope Submit manuscript

Inference and Learning with Hierarchical Shape Models

Download PDF

Iasonas Kokkinos¹ &
Alan Yuille²

1933 Accesses
21 Citations
Explore all metrics

Abstract

In this work we introduce a hierarchical representation for object detection. We represent an object in terms of parts composed of contours corresponding to object boundaries and symmetry axes; these are in turn related to edge and ridge features that are extracted from the image.

We propose a coarse-to-fine algorithm for efficient detection which exploits the hierarchical nature of the model. This provides a tractable framework to combine bottom-up and top-down computation. We learn our models from training images where only the bounding box of the object is provided. We automate the decomposition of an object category into parts and contours, and discriminatively learn the cost function that drives the matching of the object to the image using Multiple Instance Learning.

Using shape-based information, we obtain state-of-the-art localization results on the UIUC and ETHZ datasets.

Article PDF

The Role of Mid-Level Shape Priors in Perceptual Grouping and Image Abstraction

ObjectNet3D: A Large Scale Database for 3D Object Recognition

Scene Parsing with Object Instance Inference Using Regions and Per-exemplar Detectors

Article 28 November 2014

References

Agrawal, S., & Roth, D. (2002). Learning a sparse representation for object detection. In ECCV.
Ahuja, N., & Todorovic, S. (2007). Learning the taxonomy and models of categories present in arbitrary images. In ICCV.
Amit, Y., & Kong, A. (1996). Graphical templates for model registration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18, 225–236.
Article Google Scholar
Andrews, S., Tsochantaridis, I., & Hofmann, T. (2002). Support vector machines for multiple-instance learning. In NIPS.
Arkin, M., Chew, L., Huttenlocher, D., Kedem, K., & Mitchell, J. (1991). An efficiently computable metric for comparing polygonal shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13, 209–217.
Article Google Scholar
Birkhoff, G. (1967). Lattice theory. Providence: AMS.
MATH Google Scholar
Borenstein, E., & Ullman, S. (2002). Class-specific, top-down segmentation. In ECCV
Chen, Y., Zhu, L., Lin, C., Yuille, A. L., & Zhang, H. (2007). Rapid inference on a novel and/or graph for object detection, segmentation and parsing. In NIPS.
Chow, C., & Liu, C. (1968). Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14, 462–467.
Article MATH Google Scholar
Collins, M. (2002). Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In EMNLP.
Crandall, D., Felzenszwalb, P., & Huttenlocher, D. (2005). Spatial priors for part-based recognition using statistical models. In CVPR.
Csurka, G., Dance, C., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In ECCV.
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR (Vol. 2, pp. 886–893).
Dietterich, T. G., Lathrop, R. H., & Lozano-Perez, T. (1997). Solving the multiple-instance problem with axis-parallel rectangles. Artificial Intelligence, 89, 31–71.
Article MATH Google Scholar
Dollar, P., Babenko, B., Belongie, S., Perona, P., & Tu, Z. (2008). Multiple component learning for object detection. In ECCV.
Felzenszwalb, P., & Huttenlocher, D. (2005). Pictorial structures for object recognition. International Journal of Computer Vision, 61, 55–79.
Article Google Scholar
Felzenszwalb, P., & McAllester, A. (2007). The generalized A^∗ architecture. Journal of Artificial Intelligence Research, 29, 153–190.
MATH MathSciNet Google Scholar
Felzenszwalb, P., & Schwartz, J. (2007). Hierarchical matching of deformable shapes. In CVPR.
Felzenszwalb, P., McAllester, D., & Ramanan, D. (2008). A discriminatively trained, multiscale, deformable part model. In CVPR.
Felzenszwalb, P., Girshick, R., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.
Article Google Scholar
Fergus, R., Perona, P., & Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In CVPR.
Fergus, R., Perona, P., & Zisserman, A. (2005). A sparse object category model for efficient learning and exhaustive recognition. In CVPR.
Ferrari, V., Tuytelaars, T., & Gool, L. V. (2006). Object detection by contour segment networks. In ECCV.
Ferrari, V., Jurie, F., & Schmid, C. (2007). Accurate object detection with deformable shape models learnt from images. In CVPR.
Ferrari, V., Fevrier, L., Jurie, F., & Schmid, C. (2008). Groups of adjacent contour segments for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 36–51.
Article Google Scholar
Ferrari, V., Jurie, F., & Schmid, C. (2010). From images to shape models for object detection. International Journal of Computer Vision, 87(3), 284–303.
Article Google Scholar
Fidler, S., & Leonardis, A. (2007). Towards scalable representations of object categories: learning a hierarchy of parts. In CVPR.
Fidler, S., Boben, M., & Leonardis, A. (2008). Similarity-based cross-layered hierarchical representation for object categorization. In CVPR.
Frey, B., & Dueck, D. (2007). Clustering by passing messages between data points. Science, 315, 972–976.
Article MathSciNet Google Scholar
Fu, K. S. (1974). Syntactic pattern recognition. New York: Prentice-Hall.
MATH Google Scholar
Gehler, P., & Chapelle, O. (2007). Deterministic annealing for multiple instance learning. In AISTATS.
Grimson, E. (1991). Object recognition by computer. Cambridge: MIT Press.
Google Scholar
Gu, C., Lim, J. J., Arbelaez, P., & Malik, J. (2009). Recognition using regions. In CVPR.
Han, F., & Zhu, S. C. (2005). Bottom-up/top-down image parsing by attribute graph grammar. In ICCV.
Heckerman, D., Geiger, D., & Chickering, D. (1995). Learning Bayesian networks: the combination of knowledge and statistical data. Machine Learning, 20, 197–243.
MATH Google Scholar
Hopcroft, J., & Ullman, J. (2006). Introduction to automata theory, languages, and computation. Reading: Addison-Wesley.
Google Scholar
Ioffe, S., & Forsyth, D. A. (2001). Probabilistic methods for finding people. International Journal of Computer Vision, 43, 45–68.
Article MATH Google Scholar
Jacobs, D. W. (1996). Robust and efficient detection of salient convex groups. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18, 23–37.
Article Google Scholar
Jiang, T., Jurie, F., & Schmidt, C. (2009). Learning shape prior models for object matching. In CVPR.
Jin, Y., & Geman, S. (2006). Context and hierarchy in a probabilistic image model. In CVPR.
Keselman, Y., & Dickinson, S. (2001). Generic model abstraction from examples. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 1141–1156.
Article Google Scholar
Kokkinos, I., & Maragos, P. (2009). Synergy between image segmentation and object recognition using the expectation maximization algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 1486–1501.
Article Google Scholar
Kokkinos, I., & Yuille, A. (2007). Unsupervised learning of object deformation models. In ICCV.
Kokkinos, I., & Yuille, A. (2008). Scale invariance without scale selection. In CVPR.
Kokkinos, I., & Yuille, A. (2009). Inference and learning with hierarchical compositional models. In Stochastic Image Grammars Workshop.
Kokkinos, I., Maragos, P., & Yuille, A. (2006). Bottom-up and top-down object detection using primal sketch features and graphical models. In CVPR.
Lampert, C., Blaschko, M., & Hofmann, T. (2008). Beyond sliding windows: object localization by efficient subwindow search. In CVPR.
Leibe, B., Leonardis, A., & Schiele, B. (2004). Combined object categorization and segmentation with an implicit shape model. In ECCV, SLCV workshop.
Lempitsky, V., Blake, A., & Rother, C. (2008). Image segmentation by branch-and-mincut. In ECCV.
Lindeberg, T. (1998). Edge detection and ridge detection with automatic scale selection. International Journal of Computer Vision, 30(2), 117–156.
Article Google Scholar
Lowe, D. (1984). Perceptual organization and visual recognition. Dordrecht: Kluwer.
Google Scholar
Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 91–110.
Article Google Scholar
Malik, J., Belongie, S., Leung, T., & Shi, J. (2001). Contour and texture analysis for image segmentation. International Journal of Computer Vision, 43, 7–27.
Article MATH Google Scholar
Marr, D. (1982). Vision. New York: Freeman.
Google Scholar
Martin, D., Fowlkes, C., & Malik, J. (2004). Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26, 530–549.
Article Google Scholar
Moosmann, F., Triggs, B., & Jurie, F. (2006). Randomized clustering forests for building fast and discriminative visual vocabularies. In NIPS.
Moreels, P., Maire, M., & Perona, P. (2004). Recognition by probabilistic hypothesis construction. In ECCV (p. 55).
Mumford, D. (1993). Elastica and computer vision. In C. Bajaj (Ed.), Algebraic geometry and its applications (pp. 507–518). Berlin: Springer.
Google Scholar
Opelt, A., Pinz, A., & Zisserman, A. (2006a). Incremental learning of object detectors using a visual shape alphabet. In CVPR.
Opelt, A., Pinz, A., & Zisserman, A. (2006b). Boundary-fragment-model for object detection. In CVPR.
Parikh, D., Zitnick, L., & Chen, T. (2009). Unsupervised learning of hierarchical spatial structures in images. In CVPR.
Pearl, J. (1984). Heuristics. Reading: Addison-Wesley.
Google Scholar
Porway, J., Yao, B., & Zhu, S. C. (2008). Learning compositional models for object categories from small sample sets. In Object categorization: computer and human vision perspectives. Cambridge: Cambridge University Press.
Google Scholar
Quattoni, A., Wang, S., Morency, L. P., Collins, M., & Darrell, T. (2007). Hidden conditional random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29, 1848–1852.
Article Google Scholar
Ramanan, D., & Sminchisescu, C. (2006). Training deformable models for localization. In CVPR.
Russell, S., & Norvig, P. (2003). Artificial intelligence: a modern approach. New York: Prentice Hall.
Google Scholar
Russell, G., Brooks, R., & Binford, T. (1979). The ACRONYM model-based vision system. In IJCAI.
Russell, B. C., Efros, A. A., Sivic, J., Freeman, W. T., & Zisserman, A. (2006). Using multiple segmentations to discover objects and their extent in image collections. In CVPR
Schmid, C., & Mohr, R. (1997). Local grayvalue invariants for object retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19, 530–534.
Article Google Scholar
Sharon, E., Brandt, A., & Basri, R. (2000). Completion energies and scale. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 1117–1131.
Article Google Scholar
Shotton, J., Blake, A., & Cipolla, R. (2005). Contour-based learning for object recognition. In ICCV.
Shotton, J., Johnson, M., & Cipolla, R. (2006). Semantic texton forests for image categorization and segmentation. In ECCV.
Siddiqi, K., & Kimia, B. (1995). Parts of visual form: computational aspects. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17, 239–251.
Article Google Scholar
Sivic, J., Russell, B., Efros, A., Zisserman, A., & Freeman, W. (2005). Discovering object categories in image collections. In ICCV.
Sudderth, E., Ihler, A., Freeman, W., & Willsky, A. (2003). Nonparametric belief propagation. In CVPR.
Sudderth, E., Torralba, A., Freeman, W., & Willsky, A. (2005). Learning hierarchical models of scenes, objects, and parts. In ICCV.
Taskar, B., Klein, D., Collins, M., Koller, D., & Manning, C. (2004). Max-margin parsing. In EMNLP04.
Todorovic, S., & Ahuja, N. (2006). Extracting subimages of an unknown category from a set of images. In CVPR.
Todorovic, S., & Ahuja, N. (2008). Learning subcategory relevances for category recognition. In CVPR.
Tu, Z., Chen, X., Yuille, A., & Zhu, S. (2005). Image parsing: unifying segmentation, detection, and recognition. International Journal of Computer Vision, 63, 113–140.
Article Google Scholar
Vijayanarasimhan, S., & Grauman, K. (2008). Multiple-instance learning for weakly supervised object categorization. In CVPR.
Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In CVPR.
Viola, P., Platt, J. C., & Zhang, C. (2006). Multiple instance boosting and object detection. In NIPS.
Welling, M., Weber, M., & Perona, P. (2000). Unsupervised learning of models for recognition. In ECCV.
Wu, Y., Shi, Z., Fleming, C., & Zhu, S. C. (2007). Deformable template as active basis. In ICCV.
Zhu, S. C., Wu, Y. N., & Mumford, D. (1998). FRAME: filters, random field and maximum entropy: towards a unified theory for texture modeling. International Journal of Computer Vision, 27(2), 1–20.
Article Google Scholar
Zhu, S. C., & Mumford, D. (2007). Quest for a stochastic grammar of images. Foundations and Trends in Computer Graphics and Vision, 2, 259–362.
Article Google Scholar
Zhu, S. C., & Yuille, A. (1996). Region competition: unifying snakes region. Growing and Bayes/MDL for multiband image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18, 884–900.
Article Google Scholar
Zhu, L., Lin, C., Huang, H., Chen, Y., & Yuille, A. (2008a). Unsupervised structure learning: hierarchical recursive composition, suspicious coincidence and competitive exclusion. In ECCV.
Zhu, L., Chen, Y., Lu, Y., Lin, C., & Yuille, A. (2008b). Max margin AND/OR graph learning for parsing the human body. In CVPR.
Zhu, L., Chen, Y., Ye, X., & Yuille, A. (2008c). Structure-perceptron learning of a hierarchical log-linear model. In CVPR.
Zhu, Q., Wang, L., Wu, Y., & Shi, J. (2008d). Contour context selection for object detection: a set-to-set contour matching approach. In ECCV.

Download references

Author information

Authors and Affiliations

Department of Applied Mathematics, Ecole Centrale Paris and Equipe Galen, INRIA-Saclay, France
Iasonas Kokkinos
Department of Statistics and Computer Science, University of California at Los Angeles, Los Angeles, USA
Alan Yuille

Authors

Iasonas Kokkinos
View author publications
You can also search for this author in PubMed Google Scholar
Alan Yuille
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Iasonas Kokkinos.

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and permissions

About this article

Cite this article

Kokkinos, I., Yuille, A. Inference and Learning with Hierarchical Shape Models. Int J Comput Vis 93, 201–225 (2011). https://doi.org/10.1007/s11263-010-0398-7

Download citation

Received: 19 October 2009
Accepted: 29 September 2010
Published: 28 October 2010
Issue Date: June 2011
DOI: https://doi.org/10.1007/s11263-010-0398-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Inference and Learning with Hierarchical Shape Models

Abstract

Article PDF

Similar content being viewed by others

The Role of Mid-Level Shape Priors in Perceptual Grouping and Image Abstraction

ObjectNet3D: A Large Scale Database for 3D Object Recognition

Scene Parsing with Object Instance Inference Using Regions and Per-exemplar Detectors

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Inference and Learning with Hierarchical Shape Models

Abstract

Article PDF

Similar content being viewed by others

The Role of Mid-Level Shape Priors in Perceptual Grouping and Image Abstraction

ObjectNet3D: A Large Scale Database for 3D Object Recognition

Scene Parsing with Object Instance Inference Using Regions and Per-exemplar Detectors

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation