Abstract
In this work we introduce a hierarchical representation for object detection. We represent an object in terms of parts composed of contours corresponding to object boundaries and symmetry axes; these are in turn related to edge and ridge features that are extracted from the image.
We propose a coarse-to-fine algorithm for efficient detection which exploits the hierarchical nature of the model. This provides a tractable framework to combine bottom-up and top-down computation. We learn our models from training images where only the bounding box of the object is provided. We automate the decomposition of an object category into parts and contours, and discriminatively learn the cost function that drives the matching of the object to the image using Multiple Instance Learning.
Using shape-based information, we obtain state-of-the-art localization results on the UIUC and ETHZ datasets.
Article PDF
Similar content being viewed by others
References
Agrawal, S., & Roth, D. (2002). Learning a sparse representation for object detection. In ECCV.
Ahuja, N., & Todorovic, S. (2007). Learning the taxonomy and models of categories present in arbitrary images. In ICCV.
Amit, Y., & Kong, A. (1996). Graphical templates for model registration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18, 225–236.
Andrews, S., Tsochantaridis, I., & Hofmann, T. (2002). Support vector machines for multiple-instance learning. In NIPS.
Arkin, M., Chew, L., Huttenlocher, D., Kedem, K., & Mitchell, J. (1991). An efficiently computable metric for comparing polygonal shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13, 209–217.
Birkhoff, G. (1967). Lattice theory. Providence: AMS.
Borenstein, E., & Ullman, S. (2002). Class-specific, top-down segmentation. In ECCV
Chen, Y., Zhu, L., Lin, C., Yuille, A. L., & Zhang, H. (2007). Rapid inference on a novel and/or graph for object detection, segmentation and parsing. In NIPS.
Chow, C., & Liu, C. (1968). Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14, 462–467.
Collins, M. (2002). Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In EMNLP.
Crandall, D., Felzenszwalb, P., & Huttenlocher, D. (2005). Spatial priors for part-based recognition using statistical models. In CVPR.
Csurka, G., Dance, C., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In ECCV.
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR (Vol. 2, pp. 886–893).
Dietterich, T. G., Lathrop, R. H., & Lozano-Perez, T. (1997). Solving the multiple-instance problem with axis-parallel rectangles. Artificial Intelligence, 89, 31–71.
Dollar, P., Babenko, B., Belongie, S., Perona, P., & Tu, Z. (2008). Multiple component learning for object detection. In ECCV.
Felzenszwalb, P., & Huttenlocher, D. (2005). Pictorial structures for object recognition. International Journal of Computer Vision, 61, 55–79.
Felzenszwalb, P., & McAllester, A. (2007). The generalized A∗ architecture. Journal of Artificial Intelligence Research, 29, 153–190.
Felzenszwalb, P., & Schwartz, J. (2007). Hierarchical matching of deformable shapes. In CVPR.
Felzenszwalb, P., McAllester, D., & Ramanan, D. (2008). A discriminatively trained, multiscale, deformable part model. In CVPR.
Felzenszwalb, P., Girshick, R., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.
Fergus, R., Perona, P., & Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In CVPR.
Fergus, R., Perona, P., & Zisserman, A. (2005). A sparse object category model for efficient learning and exhaustive recognition. In CVPR.
Ferrari, V., Tuytelaars, T., & Gool, L. V. (2006). Object detection by contour segment networks. In ECCV.
Ferrari, V., Jurie, F., & Schmid, C. (2007). Accurate object detection with deformable shape models learnt from images. In CVPR.
Ferrari, V., Fevrier, L., Jurie, F., & Schmid, C. (2008). Groups of adjacent contour segments for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 36–51.
Ferrari, V., Jurie, F., & Schmid, C. (2010). From images to shape models for object detection. International Journal of Computer Vision, 87(3), 284–303.
Fidler, S., & Leonardis, A. (2007). Towards scalable representations of object categories: learning a hierarchy of parts. In CVPR.
Fidler, S., Boben, M., & Leonardis, A. (2008). Similarity-based cross-layered hierarchical representation for object categorization. In CVPR.
Frey, B., & Dueck, D. (2007). Clustering by passing messages between data points. Science, 315, 972–976.
Fu, K. S. (1974). Syntactic pattern recognition. New York: Prentice-Hall.
Gehler, P., & Chapelle, O. (2007). Deterministic annealing for multiple instance learning. In AISTATS.
Grimson, E. (1991). Object recognition by computer. Cambridge: MIT Press.
Gu, C., Lim, J. J., Arbelaez, P., & Malik, J. (2009). Recognition using regions. In CVPR.
Han, F., & Zhu, S. C. (2005). Bottom-up/top-down image parsing by attribute graph grammar. In ICCV.
Heckerman, D., Geiger, D., & Chickering, D. (1995). Learning Bayesian networks: the combination of knowledge and statistical data. Machine Learning, 20, 197–243.
Hopcroft, J., & Ullman, J. (2006). Introduction to automata theory, languages, and computation. Reading: Addison-Wesley.
Ioffe, S., & Forsyth, D. A. (2001). Probabilistic methods for finding people. International Journal of Computer Vision, 43, 45–68.
Jacobs, D. W. (1996). Robust and efficient detection of salient convex groups. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18, 23–37.
Jiang, T., Jurie, F., & Schmidt, C. (2009). Learning shape prior models for object matching. In CVPR.
Jin, Y., & Geman, S. (2006). Context and hierarchy in a probabilistic image model. In CVPR.
Keselman, Y., & Dickinson, S. (2001). Generic model abstraction from examples. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 1141–1156.
Kokkinos, I., & Maragos, P. (2009). Synergy between image segmentation and object recognition using the expectation maximization algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 1486–1501.
Kokkinos, I., & Yuille, A. (2007). Unsupervised learning of object deformation models. In ICCV.
Kokkinos, I., & Yuille, A. (2008). Scale invariance without scale selection. In CVPR.
Kokkinos, I., & Yuille, A. (2009). Inference and learning with hierarchical compositional models. In Stochastic Image Grammars Workshop.
Kokkinos, I., Maragos, P., & Yuille, A. (2006). Bottom-up and top-down object detection using primal sketch features and graphical models. In CVPR.
Lampert, C., Blaschko, M., & Hofmann, T. (2008). Beyond sliding windows: object localization by efficient subwindow search. In CVPR.
Leibe, B., Leonardis, A., & Schiele, B. (2004). Combined object categorization and segmentation with an implicit shape model. In ECCV, SLCV workshop.
Lempitsky, V., Blake, A., & Rother, C. (2008). Image segmentation by branch-and-mincut. In ECCV.
Lindeberg, T. (1998). Edge detection and ridge detection with automatic scale selection. International Journal of Computer Vision, 30(2), 117–156.
Lowe, D. (1984). Perceptual organization and visual recognition. Dordrecht: Kluwer.
Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 91–110.
Malik, J., Belongie, S., Leung, T., & Shi, J. (2001). Contour and texture analysis for image segmentation. International Journal of Computer Vision, 43, 7–27.
Marr, D. (1982). Vision. New York: Freeman.
Martin, D., Fowlkes, C., & Malik, J. (2004). Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26, 530–549.
Moosmann, F., Triggs, B., & Jurie, F. (2006). Randomized clustering forests for building fast and discriminative visual vocabularies. In NIPS.
Moreels, P., Maire, M., & Perona, P. (2004). Recognition by probabilistic hypothesis construction. In ECCV (p. 55).
Mumford, D. (1993). Elastica and computer vision. In C. Bajaj (Ed.), Algebraic geometry and its applications (pp. 507–518). Berlin: Springer.
Opelt, A., Pinz, A., & Zisserman, A. (2006a). Incremental learning of object detectors using a visual shape alphabet. In CVPR.
Opelt, A., Pinz, A., & Zisserman, A. (2006b). Boundary-fragment-model for object detection. In CVPR.
Parikh, D., Zitnick, L., & Chen, T. (2009). Unsupervised learning of hierarchical spatial structures in images. In CVPR.
Pearl, J. (1984). Heuristics. Reading: Addison-Wesley.
Porway, J., Yao, B., & Zhu, S. C. (2008). Learning compositional models for object categories from small sample sets. In Object categorization: computer and human vision perspectives. Cambridge: Cambridge University Press.
Quattoni, A., Wang, S., Morency, L. P., Collins, M., & Darrell, T. (2007). Hidden conditional random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29, 1848–1852.
Ramanan, D., & Sminchisescu, C. (2006). Training deformable models for localization. In CVPR.
Russell, S., & Norvig, P. (2003). Artificial intelligence: a modern approach. New York: Prentice Hall.
Russell, G., Brooks, R., & Binford, T. (1979). The ACRONYM model-based vision system. In IJCAI.
Russell, B. C., Efros, A. A., Sivic, J., Freeman, W. T., & Zisserman, A. (2006). Using multiple segmentations to discover objects and their extent in image collections. In CVPR
Schmid, C., & Mohr, R. (1997). Local grayvalue invariants for object retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19, 530–534.
Sharon, E., Brandt, A., & Basri, R. (2000). Completion energies and scale. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 1117–1131.
Shotton, J., Blake, A., & Cipolla, R. (2005). Contour-based learning for object recognition. In ICCV.
Shotton, J., Johnson, M., & Cipolla, R. (2006). Semantic texton forests for image categorization and segmentation. In ECCV.
Siddiqi, K., & Kimia, B. (1995). Parts of visual form: computational aspects. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17, 239–251.
Sivic, J., Russell, B., Efros, A., Zisserman, A., & Freeman, W. (2005). Discovering object categories in image collections. In ICCV.
Sudderth, E., Ihler, A., Freeman, W., & Willsky, A. (2003). Nonparametric belief propagation. In CVPR.
Sudderth, E., Torralba, A., Freeman, W., & Willsky, A. (2005). Learning hierarchical models of scenes, objects, and parts. In ICCV.
Taskar, B., Klein, D., Collins, M., Koller, D., & Manning, C. (2004). Max-margin parsing. In EMNLP04.
Todorovic, S., & Ahuja, N. (2006). Extracting subimages of an unknown category from a set of images. In CVPR.
Todorovic, S., & Ahuja, N. (2008). Learning subcategory relevances for category recognition. In CVPR.
Tu, Z., Chen, X., Yuille, A., & Zhu, S. (2005). Image parsing: unifying segmentation, detection, and recognition. International Journal of Computer Vision, 63, 113–140.
Vijayanarasimhan, S., & Grauman, K. (2008). Multiple-instance learning for weakly supervised object categorization. In CVPR.
Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In CVPR.
Viola, P., Platt, J. C., & Zhang, C. (2006). Multiple instance boosting and object detection. In NIPS.
Welling, M., Weber, M., & Perona, P. (2000). Unsupervised learning of models for recognition. In ECCV.
Wu, Y., Shi, Z., Fleming, C., & Zhu, S. C. (2007). Deformable template as active basis. In ICCV.
Zhu, S. C., Wu, Y. N., & Mumford, D. (1998). FRAME: filters, random field and maximum entropy: towards a unified theory for texture modeling. International Journal of Computer Vision, 27(2), 1–20.
Zhu, S. C., & Mumford, D. (2007). Quest for a stochastic grammar of images. Foundations and Trends in Computer Graphics and Vision, 2, 259–362.
Zhu, S. C., & Yuille, A. (1996). Region competition: unifying snakes region. Growing and Bayes/MDL for multiband image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18, 884–900.
Zhu, L., Lin, C., Huang, H., Chen, Y., & Yuille, A. (2008a). Unsupervised structure learning: hierarchical recursive composition, suspicious coincidence and competitive exclusion. In ECCV.
Zhu, L., Chen, Y., Lu, Y., Lin, C., & Yuille, A. (2008b). Max margin AND/OR graph learning for parsing the human body. In CVPR.
Zhu, L., Chen, Y., Ye, X., & Yuille, A. (2008c). Structure-perceptron learning of a hierarchical log-linear model. In CVPR.
Zhu, Q., Wang, L., Wu, Y., & Shi, J. (2008d). Contour context selection for object detection: a set-to-set contour matching approach. In ECCV.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Kokkinos, I., Yuille, A. Inference and Learning with Hierarchical Shape Models. Int J Comput Vis 93, 201–225 (2011). https://doi.org/10.1007/s11263-010-0398-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-010-0398-7