Skip to main content
Log in

From Aardvark to Zorro: A Benchmark for Mammal Image Classification

International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Current object recognition systems aim at recognizing numerous object classes under limited supervision conditions. This paper provides a benchmark for evaluating progress on this fundamental task. Several methods have recently proposed to utilize the commonalities between object classes in order to improve generalization accuracy. Such methods can be termed interclass transfer techniques. However, it is currently difficult to asses which of the proposed methods maximally utilizes the shared structure of related classes. In order to facilitate the development, as well as the assessment of methods for dealing with multiple related classes, a new dataset including images of several hundred mammal classes, is provided, together with preliminary results of its use. The images in this dataset are organized into five levels of variability, and their labels include information on the objects’ identity, location and pose. From this dataset, a classification benchmark has been derived, requiring fine distinctions between 72 mammal classes. It is then demonstrated that a recognition method which is highly successful on the Caltech101, attains limited accuracy on the current benchmark (36.5%). Since this method does not utilize the shared structure between classes, the question remains as to whether interclass transfer methods can increase the accuracy to the level of human performance (90%). We suggest that a labeled benchmark of the type provided, containing a large number of related classes is crucial for the development and evaluation of classification methods which make efficient use of interclass transfer.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

  • Belongie, S., Malik, J., & Puzicha, J. (2001). Matching shapes. In ICCV.

  • Ben-David, S., & Schuller, R. (2003). Exploiting task relatedness for multitask learning. In COLT.

  • Berg, T. L., & Forsyth, D. (2006). Animals on the web. In CVPR.

  • Changizi, M. A., & Shimojo S. (2005). Character complexity and redundancy in writing systems over human history. Proceedings: Biological Sciences 272(1560), 267–275.

    Article  Google Scholar 

  • Chapelle, O., Haner, P., & Vapnik, V. (1999). Support vector machines for histogram-based image classification. IEEE Transactions on Neural Networks, 10(5), 1055.

    Article  Google Scholar 

  • Crammer, K., & Singer, Y. (2002). On the algorithmic implementation of multiclass kernel-based vector machines. Journal of Machine Learning Research, 2, 265–292.

    Article  MATH  Google Scholar 

  • Fei-Fei, L., VanRullen, R., Koch, C., & Perona P. (2002). Rapid natural scene categorization in the near absence of attention. Proceedings of the National Academy of Sciences, 99(14), 9596.

    Article  Google Scholar 

  • Fei-Fei, L., Fergus, R., & Perona, P. (2004). Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. In CVPR workshop on generative based vision.

  • Fink, M. (2004). Object classification from a single example utilizing class relevance metrics. In NIPS.

  • Fink, M., Shalev-Shwartz, S., Singer, Y., & Ullman, S. (2006). Online multiclass learning by interclass hypothesis sharing. In ICML.

  • Fink, M., Ben-Shakhar, G., & Ullman S. (2007, under review). Preferential encoding of features distinctive for multiple categories. Cognitive Science.

  • Grauman, K., & Darrell, T. (2005a). Efficient image matching with distributions of local invariant features. In CVPR.

  • Grauman, K., & Darrell, T. (2005b). Pyramid match kernels: discriminative classification with sets of image features. In ICCV.

  • Krempp, S., Geman, D., & Amit, Y. (2002). Sequential learning of reusable parts for object detection. Technical report, CS Johns Hopkins.

  • Lazebnik, S., Schmid, C., & Ponce, J. (2003). Affine-invariant local descriptors and neighborhood statistics for texture recognition. In ICCV.

  • Levi, K., & Fink, M. (2004). Learning from a small number of training examples by exploiting object categories. In LCVPR04 workshop on learning in computer vision.

  • Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.

    Article  Google Scholar 

  • Mikolajczyk, K., & Schmid, C. (2005). A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10), 1615.

    Article  Google Scholar 

  • Miller, E., Matsakis, N., & Viola, P. (2000). Learning from one example through shared densities on transforms. In CVPR.

  • Ponce, J., Berg, T. L., Everingham, M., Forsyth, D., Hebert, M., Lazebnik, S. et al. (2006). Dataset issues in object recognition. In Towards category-level object recognition. Berlin: Springer.

    Google Scholar 

  • Rubner, Y., Tomasi, C., & Guibas, L. (2000). The Earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision, 40(2), 99.

    Article  MATH  Google Scholar 

  • Schiele, B., & Crowley, J. (2000). Recognition without correspondence using multidimensional receptive field histograms. International Journal of Computer Vision 36(1), 31–50.

    Article  Google Scholar 

  • Schneiderman, H., & Kanade, T. (2000). A statistical model for 3D object detection applied to faces and cars. In CVPR.

  • Schölkopf, B., & Smola, A. (2002). Learning with kernels: support vector machines, regularization, optimization and beyond. Cambridge: MIT Press.

    Google Scholar 

  • Thorpe, S., Fize, D., & Marlot, C. (1996). Speed of processing in the human visual system. Nature 381, 520–522.

    Article  Google Scholar 

  • Thrun, S., & Pratt, L. (1997). Learning to learn. Dordrecht: Kluwer Academic.

    Google Scholar 

  • Torralba, A., Murphy, K., & Freeman, W. (2004). Sharing features: efficient boosting procedures for multiclass object detection. In CVPR.

  • Tree of life project. (1995). http://tolweb.org/.

  • Zhang, J., Marszalek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object categories: a comprehensive study. International Journal of Computer Vision, 73(2), 213–238.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Fink.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fink, M., Ullman, S. From Aardvark to Zorro: A Benchmark for Mammal Image Classification. Int J Comput Vis 77, 143–156 (2008). https://doi.org/10.1007/s11263-007-0066-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-007-0066-8

Keywords

Navigation