Abstract
The appearance of an object is composed of local structure. This local structure can be described and characterized by a vector of local features measured by local operators such as Gaussian derivatives or Gabor filters. This article presents a technique where appearances of objects are represented by the joint statistics of such local neighborhood operators. As such, this represents a new class of appearance based techniques for computer vision. Based on joint statistics, the paper develops techniques for the identification of multiple objects at arbitrary positions and orientations in a cluttered scene. Experiments show that these techniques can identify over 100 objects in the presence of major occlusions. Most remarkably, the techniques have low complexity and therefore run in real-time.
Similar content being viewed by others
References
Ballard, D. and Rao, R. 1994. Seeing behind occlusions. In ECCV'94 Third European Conference on Computer Vision, Vol. 1, pp. 274–285.
Ballard, D. and Wixson, L. 1993. Object recognition using steerable filters at multiple scales. In IEEE Workshop on Qualitative Vision, pp. 2–10.
Basseville, M. 1996. Information: entropies, divergences et moyennes. Technical Report 1020, IRISA (in French).
Belongie, S., Carson, C., Greenspan, H., and Malik, J. 1998. Colorand texture-based image segmentation using the expectationmaximization algorithm and its application to content-based image retrieval. In ICCV'98 Sixth International Conference on Computer Vision, pp. 675–682.
Burkhardt, H. and Zisserman, A. (Eds.). 1992. Invariants for recognition. ESPRIT- Basic-Research-Workshop, ECCV'92.
Burns, J., Weiss, R., and Riseman, E. 1990. View variation of point set and line segment features. In Proceedings DARPA Image Understanding Workshop, pp. 650–659.
Califano, A. and Mohan, R. 1993. Systematic design of indexing strategies for object recognition. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 709–710.
Clemens, D. and Jacobs, D. 1991. Space and time bounds of indexing 3-d models from 2-d images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(10):1007–1017.
Daugman, J. 1993. High confidence visual recognition of persons by test of statistical independence. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(11):1148–1161.
Deriche, R. 1987. Using canny' criteria to derive a recursively implemented optimal edge detector. International Journal of Computer Vision, 1(2):167–187. See also Deriche (1993).
Deriche, R. 1993. Recursively implementing the gaussian and its derivatives. Technical Report 1893, INRIA- Sophia Antipolis.
Ennesser, F. and Medioni, G. 1995. Finding waldo, or focus of attention using local color information. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(8):805–809.
Finlayson, G., Schiele, B., and Crowley, J. 1998. Comprehensive colour image normalization. In ECCV'98 Fifth European Conference on Computer Vision, Vol. 1, pp. 475–490.
Flickner, M., Sawhney, H., Niblack, W., Ashley, J., Juang, Q., Dom, B., Gorkani, M., Hafner, J., Lee, D., Petkovic, D., Steele, D., and Yanker, P. 1995. Query by image and video content: The QBIC system. IEEE Computer, pp. 23–32.
Freeman, W. and Adelson, E. 1991. The design and use of steerable filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(9):891–906.
Fukunaga, K. 1990. Introduction to statistical pattern recognition. In Computer Science and Scientific Computing, 2nd edn., Academic Press: New York.
Funt, B. and Finlayson, G. 1995. Color constant color indexing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(5):522–529.
Gabor, D. 1946. Theory of communication. Proc. Inst. Elec. Eng., 93(26):429–441.
Grimson, W., Huttenlocher, D., and Jacobs, D. 1994. A study of affine matching with bounded sensor error. International Journal of Computer Vision, 13(1):7–32.
Grimson, W. and Huttenlocher D. (Eds.). 1991. Interpretation of 3-d scenes-Part i (special issue). IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(10).
Grimson, W. and Huttenlocher, D. (Eds.) 1992. Interpretation of 3-d scenes-Part ii (special issue). IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(2).
Hafner, J., Sawhney, H., Equitz, W., Flickner, M., and Niblack, W. 1995. Efficient color histogram indexing for quadratic form distance functions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(7):729–736.
Haralick, R. 1979. Statistical and structural approaches to texture. Proceedings of IEEE, 67(5):786–804.
Healey, G. and Slater, D. 1994. Using illumination invariant color histogram descriptors for recognition. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 355–360.
Hornegger, J. and Niemann, H. 1995. Statistical learning, localization and identification of objects. In ICCV'95 Fifth International Conference on Computer Vision, pp. 914–919.
Intrator, N. and Gold, J. 1993. Three- dimensional object recognition using an unsupervised bcm network: The usefulness of distinguishing features. Neural Computation, 5:61–74.
Jones, D. and Malik, J. 1992. A computational framework for determining stereo correspondence from a set of linear spatial filters. In ECCV'92 Second European Conference on Computer Vision, pp. 395–410.
Koenderink, J. and Doorn, A. 1987. Representation of local geometry in the visual system. Biological Cybernetics, 55:367–375.
Lamdan, Y., Schwartz, J., and Wolfson, H. 1988. Object recognition by affine invariant matching. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 335–344.
Lamdan, Y. and Wolfson, H. 1988. Geometric hashing: A general and efficient model based recognition scheme. In ICCV'88 Second International Conference on Computer Vision, pp. 238–249.
Malik, J. and Perona, P. 1989. A computational model of texture segmentation. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 326–332.
Mao, J. and Jain, A. 1992. Texture classification and segmentation using multiresolution simultaneous autoregressive models. Pattern Recognition, 25(2):173–188.
Matas, J., Marik, R., and Kittler, J. 1995. On representation and matching of multi-colored objects. In ICCV'95 Fifth International Conference on Computer Vision, pp. 726–732.
Mel, B. 1997. Seemore: Combing color, shape, and texture histogramming in a neurally-inspired approach to visual object recognition. Neural Computation, 9:777–804.
Moghaddam, B. and Pentland, A. 1995. Maximum likelihood detection of faces and hands. In International Workshop on Automatic Face-and Gesture-Recognition, pp. 122–128.
Mohr, R., Picard, S., and Schmid, C. 1997. Bayesian decision versus voting for image retrieval. In Procceedings of the 7th International Conference on Computer Analysis of Images and Patterns, pp. 376–383.
Mundy, J.L. and Zisserman, A. (Eds.). 1992. Geometric Invariance in Computer Vision. MIT Press.
Mundy, J.L., Zisserman, A., and Forsyth, D. (Eds.). 1993. Application of Invariance in Computer Vision. Volume 825 of Lecture Notes in Computer Science, Springer Verlag.
Murase, H. and Nayar, S. 1995. Visual learning and recognition of 3d objects from appearance. International Journal of Computer Vision, 14:5–24.
Nagao, K. 1995. Recognizing 3d objects using photometric invariants. In ICCV'95 Fifth International Conference on Computer Vision, pp. 480–487.
Object Representation 1996. In International Workshop on Object Representation for Computer Vision, Cambridge, England.
Ohba, K. and Ikeuchi, K. 1996. Recognition of the multi specularity objects for bin-picking task. In IROS'96 Intelligent Robots and Systems, Osaka, Japan, pp. 1440–1447.
Perona, P. 1995. Deformable kernels in early vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(5):488–499.
Popat, K. and Picard, R. 1994. Cluster-based probability model applied to image restoration and compression. In IEEE Conference on Acoustics, Speech and Signal Processing.
Pope, A. 1995. Learning to Recognize Objects in Images: Acquiring and Using Probabilistic Models of Appearance. Ph.D. Thesis, Department of Computer Science, University of British Columbia.
Pope, A. and Lowe, D. 1996. Learning appearance models for object recognition. In International Workshop on Object Representation for Computer Vision, Cambridge, England.
Press, W.H., Teukolsky, S.A., Vetterling, W.T., and Flannery, B.P. 1992. Numerical Recipes in C, 2nd edn., Cambridge University Press.
Rao, R. and Ballard, D. 1995. An active vision architecture based on iconic representations. Artificial Intelligence, 78:461–505.
Rao, R. and Ballard, D. 1997. Dynamic model of visual recognition predicts neural response properties in the visual cortex. Neural Computation, 9(4):721–763.
Rigoutsos, I. and Hummel, R. 1993. Distributed Bayesian object recognition. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 180–186.
Schiele, B. 1997. Object Recognition using Multidimensional Receptive Field Histograms. Ph.D. Thesis (I.N.P.Grenoble English translation).
Schmid, C. and Mohr, R. 1997. Local grayvalue invariants for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(5):530–535.
Schmid, C., Mohr, R., and Bauckhage, C. 1998. Comparing and evaluating interest points. In ICCV'98 Sixth International Conference on Computer Vision.
Sirovich, L. and Kirby, M. 1987. Low- dimensional procedure for the characterization of human faces. Journal of the Optical Society of America, 4(3):519–524.
Slater, D. and Healey, G. 1995. Combining color and geometric information for the illumination invariant recognition of 3d objects. In ICCV'95 Fifth International Conference on Computer Vision, pp. 563–568.
Swain, M. and Ballard, D. 1991. Color indexing. International Journal of Computer Vision, 7(1):11–32.
Tsotsos, J. 1989. The complexity of perceptual search tasks. In Proceedings of the 11th International Joint Conference on Artificial Intelligence, pp. 1571–1577.
Turk, M. and Pentland, A. 1991. Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3(1):71–86.
Westelius, C.-J. 1992. Preattentive Gaze Control for Robot Vision. Ph.D. Thesis, Department of Electrical Engineering, Linköping University.
Wolfson, H. 1990. Model-based object recognition by geometric hashing. In ECCV'90 First European Conference on Computer Vision, pp. 526–536.
Young, R. 1986. Simulation of human retinal function with the gaussian derivative model. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 564–569.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Schiele, B., Crowley, J.L. Recognition without Correspondence using Multidimensional Receptive Field Histograms. International Journal of Computer Vision 36, 31–50 (2000). https://doi.org/10.1023/A:1008120406972
Issue Date:
DOI: https://doi.org/10.1023/A:1008120406972