Discovering Multipart Appearance Models from Captioned Images

Jamieson, Michael; Eskin, Yulia; Fazly, Afsaneh; Stevenson, Suzanne; Dickinson, Sven

doi:10.1007/978-3-642-15555-0_14

Michael Jamieson¹⁹,
Yulia Eskin¹⁹,
Afsaneh Fazly¹⁹,
Suzanne Stevenson¹⁹ &
…
Sven Dickinson¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 6315))

Included in the following conference series:

European Conference on Computer Vision

6053 Accesses
2 Citations

Abstract

Even a relatively unstructured captioned image set depicting a variety of objects in cluttered scenes contains strong correlations between caption words and repeated visual structures. We exploit these correlations to discover named objects and learn hierarchical models of their appearance. Revising and extending a previous technique for finding small, distinctive configurations of local features, our method assembles these co-occurring parts into graphs with greater spatial extent and flexibility. The resulting multipart appearance models remain scale, translation and rotation invariant, but are more reliable detectors and provide better localization. We demonstrate improved annotation precision and recall on datasets to which the non-hierarchical technique was previously applied and show extended spatial coverage of detected objects.

Download to read the full chapter text

Chapter PDF

Microsoft COCO: Common Objects in Context

Unsupervised Semantic Discovery Through Visual Patterns Detection

Photo Recall: Using the Internet to Label Your Photos

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Jamieson, M., Fazly, A., Dickinson, S., Stevenson, S., Wachsmuth, S.: Using language to learn structured appearance models for image annotation. IEEE PAMI 32, 148–164 (2010)
Google Scholar
Barnard, K., Duygulu, P., Forsyth, D., de Freitas, N., Blei, D., Jordan, M.: Matching words and pictures. Journal of Machine Learning Research 3, 1107–1135 (2003)
Article MATH Google Scholar
Carneiro, G., Chan, A., Moreno, P., Vasconcelos, N.: Supervised learning of semantic classes for image annotation and retrieval. IEEE PAMI 29, 394–410 (2007)
Google Scholar
Carbonetto, P., de Freitas, N., Barnard, K.: A statistical model for general contextual object recognition. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3021, pp. 350–362. Springer, Heidelberg (2004)
Chapter Google Scholar
Monay, F., Gatica-Perez, D.: Modeling semantic aspects for cross-media image indexing. IEEE PAMI 29, 1802–1817 (2007)
Google Scholar
Quattoni, A., Collins, M., Darrell, T.: Learning visual representations using images with captions. In: CVPR (2007)
Google Scholar
Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A.: Learning object categories from Google’s image search. In: CVPR (2005)
Google Scholar
Crandall, D.J., Huttenlocher, D.P.: Weakly supervised learning of part-based spatial models for visual object recognition. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 16–29. Springer, Heidelberg (2006)
Chapter Google Scholar
Kokkinos, I., Yuille, A.: HOP: Hierarchical object parsing. In: CVPR (2009)
Google Scholar
Zhu, L., Lin, C., Huang, H., Chen, Y., Yuille, A.: Unsupervised structure learning: Hierarchical recursive composition, suspicious coincidence and competitive exclusion. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 759–773. Springer, Heidelberg (2008)
Chapter Google Scholar
Bouchard, G., Triggs, B.: Hierarchical part-based visual object categorization. In: CVPR (2005)
Google Scholar
Fidler, S., Boben, M., Leonardis, A.: Similarity-based cross-layered hierarchical representation for object categorization. In: CVPR (2008)
Google Scholar
Epshtein, B., Ullman, S.: Feature hierarchies for object classification. In: ICCV (2005)
Google Scholar
Ommer, B., Buhmann, J.: Learning the compositional nature of visual object categories for recognition. IEEE PAMI 32, 501–516 (2010)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)
Article Google Scholar
Ke, Y., Sukthankar, R.: PCA-SIFT: A more distinctive representation for local image descriptors. In: CVPR (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Toronto,
Michael Jamieson, Yulia Eskin, Afsaneh Fazly, Suzanne Stevenson & Sven Dickinson

Authors

Michael Jamieson
View author publications
You can also search for this author in PubMed Google Scholar
Yulia Eskin
View author publications
You can also search for this author in PubMed Google Scholar
Afsaneh Fazly
View author publications
You can also search for this author in PubMed Google Scholar
Suzanne Stevenson
View author publications
You can also search for this author in PubMed Google Scholar
Sven Dickinson
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

GRASP Laboratory, University of Pennsylvania, 3330 Walnut Street, 19104, Philadelphia, PA, USA
Kostas Daniilidis
National Technical University of Athens, School of Electrical and Computer Engineering, 15773, Athens, Greece
Petros Maragos
Department of Applied Mathematics, Ecole Centrale de Paris, Grande Voie des Vignes, 92295, Chatenay-Malabry, France
Nikos Paragios

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jamieson, M., Eskin, Y., Fazly, A., Stevenson, S., Dickinson, S. (2010). Discovering Multipart Appearance Models from Captioned Images. In: Daniilidis, K., Maragos, P., Paragios, N. (eds) Computer Vision – ECCV 2010. ECCV 2010. Lecture Notes in Computer Science, vol 6315. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15555-0_14

Download citation

DOI: https://doi.org/10.1007/978-3-642-15555-0_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15554-3
Online ISBN: 978-3-642-15555-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Discovering Multipart Appearance Models from Captioned Images

Abstract

Chapter PDF

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

Unsupervised Semantic Discovery Through Visual Patterns Detection

Photo Recall: Using the Internet to Label Your Photos

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Discovering Multipart Appearance Models from Captioned Images

Abstract

Chapter PDF

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

Unsupervised Semantic Discovery Through Visual Patterns Detection

Photo Recall: Using the Internet to Label Your Photos

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation