Abstract
Representing images using Bag-of-Words (BOW) model has been shown excellent performance for image classification and retrieval. However, there are still some limitations in this model such as the presence of many noisy visual words and the hard to define vocabulary size. To circumvent these drawbacks, this paper concentrates on tuning compact, robust and thus efficient BOW model even with a universal size for image representation. The proposed approach increases expressive power by employing Sparse Partial Least Squares (SPLS) for tuning the traditional and high-dimensional BOW model and learning more discriminative subspace with 10 latent variables. The performance of learning BOW models to image classification is studied through extensive experiments on the VOC 2006 dataset. Empirical results indicate that the proposed method yields quite stable results, and outperforms the traditional BOW models with various vocabulary sizes and PCA with SVM.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Jurie, F., Triggs, B.: Creating Efficient Vocabularies for Visual Recognition. In: Proc. 10th IEEE International Conference Computer Vision (2005)
Nowak, E., Jurie, F., Triggs, B.: Sampling Strategies for Bag-of-Features Image Classification. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part IV. LNCS, vol. 3954, pp. 490–503. Springer, Heidelberg (2006)
Zhang, J., Marszalek, M., Lazebnik, S., Schmid, C.: Local features and kernels for classification of texture and object categories: A comprehensive study. IJCV 73(2), 213–238 (2007)
Sivic, J., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: ICCV (2003)
Tirilly, P., Claveau, V., Gros, P.: Language modeling for bag-of-visual words image categorization. In: Proceedings of the 2008 International Conference on Content-Based Image and Video Retrieval, ACM CIVR 2008, Niagara Falls, Canada, pp. 249–258 (July 2008)
Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: ECCV 2004 Workshop on Statistical Learning in Computer Vision, pp. 59–74 (2004)
Hastie, T., Tibshirani, R., Friedman, J.: Elements of Statistical Learning: Data Mining, Inference and Prediction, 2nd edn., 745 pages in full color. Springer, New York (2009)
van Gemert, J.C., Geusebroek, J.-M., Veenman, C.J., Smeulders, A.W.M.: Kernel Codebooks for Scene Categorization. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 696–709. Springer, Heidelberg (2008)
van Gemert, J.C., Veenman, C.J., Smeulders, A.W.M., Geusebroek, J.M.: Comparing Compact Vocabularies for Visual Categorization. In: Computer Vision and Image Understanding (2010) (in press)
van Gemert, J.C., Veenman, C.J., Smeulders, A.W.M., Geusebroek, J.M.: Visual word ambiguity. IEEE Trans. Pattern Analysis and Machine Intelligence (2010) (in press)
Yang, J., Jiang, Y.G., Hauptmann, A.G., Ngo, C.W.: Evaluating bag-of-visual-words representations in scene classification. In: Proceedings of the International Workshop on Multimedia Information Retrieval, pp. 197–206. ACM, New York (2007)
Jiang, Y.-G., Ngo, C.-W., Yang, J.: Towards Optimal Bag-of-Features for Object Categorization and Semantic Video Retrieval. In: ACM International Conference on Image and Video Retrieval (CIVR 2007), Amsterdam, Netherlands, July 9-11 (2007)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: IEEE CVPR (2006)
Yang, L., Jin, R., Sukthankar, R., Jurie, F.: Unifying Discriminative Visual Codebook Generation with Classifier Training for Object Category Recognition. In: Proceedings of Computer Vision and Pattern Recognition (2008)
Vogel, J., Schiele, B.: Semantic modeling of natural scenes for content-based image retrieval. International Journal of Computer Vision (2010)
Perronnin, F., Dance, C., Csurka, G., Bressan, M.: Universal and adapted vocabularies for generic visual categorization. IEEE Pattern Analysis and Machine Intelligence (2010)
Bosch, A., Zisserman, A., Muñoz, X.: Scene classification using a hybrid generative/discriminative approach. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(4), 712–727 (2008)
Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Machine Learning 42(1/2), 177–196 (2001)
Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–916 (2003)
Zhang, X., Li, Z., Zhang, L., Ma, W.-Y., et al.: Efficient indexing for large scale visual search. In: ICCV (2009)
Jégou, H., Douze, M., Schmid, C.: Packing bag-offeatures. In: ICCV (2009)
Perronnin, F., Liu, Y., Sanchez, J., Poirier, H.: Large-scale image retrieval with compressed Fisher vectors. In: CVPR (June 2010)
Wold, H.: Partial Least Squares. In: Kotz, S., Johnson, N. (eds.) Encyclopedia of Statistical Sciences, vol. 6, pp. 581–591. Wiley, New York (1985)
Schwartz, W.R., Kembhavi, A., Harwood, D., Davis, L.S.: Human Detection Using Partial Least Squares Analysis. Accepted to be presented in the International Conference on Computer Vision (ICCV 2009), Kyoto, Japan, September 27-October 04 (2009)
Schwartz, W.R., Davis, L.S.: Learning Discriminative Appearance-Based Models Using Partial Least Squares. In: Proceedings of the XXII Brazilian Symposium on Computer Graphics and Image Processing (SIBGRAPI 2009), Rio de Janeiro, Brazil, October 11-14 (2009)
Chung, D., Keles, S.: Sparse Partial Least Squares Classification for High Dimensional Data. Statistical Applications in Genetics and Molecular Biology 9, Article 17 (2010)
Chun, H., Keles, S.: Sparse partial least squares for simultaneous dimension reduction and variable selection. Journal of the Royal Statistical Society: Series B 72, 3–25 (2010)
Everingham, M., Zisserman, A., Williams, C., Gool, L.: The pascal visual object classes challenge 2006 results (2006), http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2006/
Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L.: A Comparison of Affine Region Detectors. International Journal Computer Vision 65(1/2), 43–72 (2005)
Mikolajczyk, K., Schmid, C.: A Performance Evaluation of Local Descriptors. IEEE Trans. Pattern Analysis and Machine Intelligence 27(10), 1,615–1,630 (2005)
Lowe, D.: Distinctive Image Features from Scale-Invariant Keypoints. International Journal Computer Vision 2(60), 91–110 (2004)
Mikolajczyk, K.: Affine Covariant Features, Visual Geometry Group, University of Oxford (2004), http://www.robots.ox.ac.uk/~vgg/research/affine/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liu, J., Zeng, G. (2011). Learning Bag-of-Words Models Using Sparse Partial Least Squares. In: Wang, Y., Li, T. (eds) Foundations of Intelligent Systems. Advances in Intelligent and Soft Computing, vol 122. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25664-6_51
Download citation
DOI: https://doi.org/10.1007/978-3-642-25664-6_51
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25663-9
Online ISBN: 978-3-642-25664-6
eBook Packages: EngineeringEngineering (R0)