Abstract
Current object recognition systems aim at recognizing numerous object classes under limited supervision conditions. This paper provides a benchmark for evaluating progress on this fundamental task. Several methods have recently proposed to utilize the commonalities between object classes in order to improve generalization accuracy. Such methods can be termed interclass transfer techniques. However, it is currently difficult to asses which of the proposed methods maximally utilizes the shared structure of related classes. In order to facilitate the development, as well as the assessment of methods for dealing with multiple related classes, a new dataset including images of several hundred mammal classes, is provided, together with preliminary results of its use. The images in this dataset are organized into five levels of variability, and their labels include information on the objects’ identity, location and pose. From this dataset, a classification benchmark has been derived, requiring fine distinctions between 72 mammal classes. It is then demonstrated that a recognition method which is highly successful on the Caltech101, attains limited accuracy on the current benchmark (36.5%). Since this method does not utilize the shared structure between classes, the question remains as to whether interclass transfer methods can increase the accuracy to the level of human performance (90%). We suggest that a labeled benchmark of the type provided, containing a large number of related classes is crucial for the development and evaluation of classification methods which make efficient use of interclass transfer.
References
Belongie, S., Malik, J., & Puzicha, J. (2001). Matching shapes. In ICCV.
Ben-David, S., & Schuller, R. (2003). Exploiting task relatedness for multitask learning. In COLT.
Berg, T. L., & Forsyth, D. (2006). Animals on the web. In CVPR.
Changizi, M. A., & Shimojo S. (2005). Character complexity and redundancy in writing systems over human history. Proceedings: Biological Sciences 272(1560), 267–275.
Chapelle, O., Haner, P., & Vapnik, V. (1999). Support vector machines for histogram-based image classification. IEEE Transactions on Neural Networks, 10(5), 1055.
Crammer, K., & Singer, Y. (2002). On the algorithmic implementation of multiclass kernel-based vector machines. Journal of Machine Learning Research, 2, 265–292.
Fei-Fei, L., VanRullen, R., Koch, C., & Perona P. (2002). Rapid natural scene categorization in the near absence of attention. Proceedings of the National Academy of Sciences, 99(14), 9596.
Fei-Fei, L., Fergus, R., & Perona, P. (2004). Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. In CVPR workshop on generative based vision.
Fink, M. (2004). Object classification from a single example utilizing class relevance metrics. In NIPS.
Fink, M., Shalev-Shwartz, S., Singer, Y., & Ullman, S. (2006). Online multiclass learning by interclass hypothesis sharing. In ICML.
Fink, M., Ben-Shakhar, G., & Ullman S. (2007, under review). Preferential encoding of features distinctive for multiple categories. Cognitive Science.
Grauman, K., & Darrell, T. (2005a). Efficient image matching with distributions of local invariant features. In CVPR.
Grauman, K., & Darrell, T. (2005b). Pyramid match kernels: discriminative classification with sets of image features. In ICCV.
Krempp, S., Geman, D., & Amit, Y. (2002). Sequential learning of reusable parts for object detection. Technical report, CS Johns Hopkins.
Lazebnik, S., Schmid, C., & Ponce, J. (2003). Affine-invariant local descriptors and neighborhood statistics for texture recognition. In ICCV.
Levi, K., & Fink, M. (2004). Learning from a small number of training examples by exploiting object categories. In LCVPR04 workshop on learning in computer vision.
Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Mikolajczyk, K., & Schmid, C. (2005). A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10), 1615.
Miller, E., Matsakis, N., & Viola, P. (2000). Learning from one example through shared densities on transforms. In CVPR.
Ponce, J., Berg, T. L., Everingham, M., Forsyth, D., Hebert, M., Lazebnik, S. et al. (2006). Dataset issues in object recognition. In Towards category-level object recognition. Berlin: Springer.
Rubner, Y., Tomasi, C., & Guibas, L. (2000). The Earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision, 40(2), 99.
Schiele, B., & Crowley, J. (2000). Recognition without correspondence using multidimensional receptive field histograms. International Journal of Computer Vision 36(1), 31–50.
Schneiderman, H., & Kanade, T. (2000). A statistical model for 3D object detection applied to faces and cars. In CVPR.
Schölkopf, B., & Smola, A. (2002). Learning with kernels: support vector machines, regularization, optimization and beyond. Cambridge: MIT Press.
Thorpe, S., Fize, D., & Marlot, C. (1996). Speed of processing in the human visual system. Nature 381, 520–522.
Thrun, S., & Pratt, L. (1997). Learning to learn. Dordrecht: Kluwer Academic.
Torralba, A., Murphy, K., & Freeman, W. (2004). Sharing features: efficient boosting procedures for multiclass object detection. In CVPR.
Tree of life project. (1995). http://tolweb.org/.
Zhang, J., Marszalek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object categories: a comprehensive study. International Journal of Computer Vision, 73(2), 213–238.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Fink, M., Ullman, S. From Aardvark to Zorro: A Benchmark for Mammal Image Classification. Int J Comput Vis 77, 143–156 (2008). https://doi.org/10.1007/s11263-007-0066-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-007-0066-8