Abstract
Robust image representations such as classemes [1], Object Bank (OB) [2], spatial pyramid representation(SPM) [3] have been proposed, showing superior performance in various high level visual recognition tasks. Our work is motivated by the need of exploring rich structural information encoded by these image representations. In this paper, we propose a novel Multi-Level Structured Image Coding approach to uncover the structure embedded in representations with rich regular structural information by learning a structured dictionary from it. Specifically, we choose Object Bank [2] to demonstrate our algorithm since it encodes both semantics and spatial location as structural information. By using the learned structured dictionary from Object Bank, we can compute a lower-dimensional and more compact encoding of the image features while preserving and accentuating the rich semantic and spatial information of OB. Our framework is an unsupervised method based on minimizing the reconstruction error of the image and object codes, with an innovative multi-level structural regularization scheme. The object dictionary and the image code obtained by our model offer intriguing intuition of real-world image structures while preserving informative structure of the original OB. We show that our more compact representation outperforms several state-of-the-art representations (including the original OB) on a wide range of high-level visual tasks such as scene classification, image retrieval and annotation.
Chapter PDF
References
Torresani, L., Szummer, M., Fitzgibbon, A.: Efficient Object Category Recognition Using Classemes. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 776–789. Springer, Heidelberg (2010)
Li, L.-J., Su, H., Lim, Y., Fei-Fei, L.: Objects as Attributes for Scene Classification. In: Kutulakos, K.N. (ed.) ECCV Workshops 2010, Part I. LNCS, vol. 6553, pp. 57–69. Springer, Heidelberg (2012)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)
Li, L.J., Su, H., Xing, E., Fei-Fei, L.: Object bank: A high-level image representation for scene classification & semantic feature sparsification. In: NIPS (2010)
Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Locality-constrained linear coding for image classification. In: CVPR, pp. 3360–3367. IEEE (2010)
Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: CVPR (2009)
Vogel, J., Schiele, B.: Semantic modeling of natural scenes for content-based image retrieval. IJCV 72, 133–157 (2007)
Salakhutdinov, R., Hinton, G.: Deep Boltzmann machines. In: Proceedings of the International Conference on Artificial Intelligence and Statistics (2009)
Hinton, G., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Computation (2006)
Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: NIPS (2006)
Olshausen, B.A., Field, D.J.: Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381, 607–609 (1996)
Grosse, R., Raina, R., Kwong, H., Ng, A.: Shift-invariant sparse coding for audio classification. In: UAI (2007)
Raina, R., Battle, A., Lee, H., Packer, B., Ng, A.Y.: Self-taught learning: Transfer learning from unlabeled data. In: ICML (2007)
Bengio, S., Pereira, F., Singer, Y., Strelow, D.: Group sparse coding. In: NIPS (2009)
Jenatton, R., Mairal, J., Obozinski, G., Bach, F.: Proximal methods for sparse hierarchical dictionary learning. In: ICML (2010)
Mairal, J., Bach, F., Ponce, J., Sapiro, G.: Online learning for matrix factorization and sparse coding. JMLR 11, 19–60 (2010)
Jia, Y., Salzmann, M., Darrell, T.: Factorized latent spaces with structured sparsity. In: NIPS (2010)
Olshausen, B.A., Field, D.J.: Sparse coding of sensory inputs. Current Opinion in Neurobiology 14, 481–487 (2004)
Quattoni, A., Carreras, X., Collins, M., Darrell, T.: An efficient projection for ℓ1, ∞ regularization. In: ICML (2009)
Friedman, J., Hastie, T., Tibshirani, R.: A note on the group lasso and a sparse group lasso. (2010) preprint, http://www-stat.stanford.edu/tibs
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Img. Sci. (2009)
Li, L.J., Fei-Fei, L.: What, where and who? classifying events by scene and object recognition. In: ICCV (2007)
Quattoni, A., Torralba, A.: Recognizing indoor scenes. In: CVPR (2009)
Wang, C., Blei, D., Fei-Fei, L.: Simultaneous image classification and annotation. In: Proc. CVPR (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, LJ., Zhu, J., Su, H., Xing, E.P., Fei-Fei, L. (2013). Multi-Level Structured Image Coding on High-Dimensional Image Representation. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds) Computer Vision – ACCV 2012. ACCV 2012. Lecture Notes in Computer Science, vol 7725. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37444-9_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-37444-9_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37443-2
Online ISBN: 978-3-642-37444-9
eBook Packages: Computer ScienceComputer Science (R0)