Skip to main content
Log in

M2Tracker: A Multi-View Approach to Segmenting and Tracking People in a Cluttered Scene

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

When occlusion is minimal, a single camera is generally sufficient to detect and track objects. However, when the density of objects is high, the resulting occlusion and lack of visibility suggests the use of multiple cameras and collaboration between them so that an object is detected using information available from all the cameras in the scene.

In this paper, we present a system that is capable of segmenting, detecting and tracking multiple people in a cluttered scene using multiple synchronized surveillance cameras located far from each other. The system is fully automatic, and takes decisions about object detection and tracking using evidence collected from many pairs of cameras. Innovations that help us tackle the problem include a region-based stereo algorithm capable of finding 3D points inside an object knowing only the projections of the object (as a whole) in two views, a segmentation algorithm using bayesian classification and the use of occlusion analysis to combine evidence from different camera pairs.

The system has been tested using different densities of people in the scene. This helps us determine the number of cameras required for a particular density of people. Experiments have also been conducted to verify and quantify the efficacy of the occlusion analysis scheme.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Cai, Q. and Aggarwal, J.K. 1998. Automatic tracking of human motion in indoor scenes across multiple synchronized video streams. In 6th Internation Conference on ComputerVision, Bombay, India, pp. 356–262.

  • Collins, R.T., Lipton, A.J., Fujiyoshi, H., and Kanade, T. 2001. Algorithms for cooperative multi-sensor surveillance. Proceedings of the IEEE, 89(10):1456–1477.

    Google Scholar 

  • Cutler, R.G., Duraiswami, R., Qian, J.H., and Davis, L.S. 2000. Design and implementation of the University of Maryland Keck Laboratory for the analysis of visual movement. Technical Report, UMIACS, University of Maryland.

  • Darrell, T., Demirdjian, D., Checka, N., and Felzenszwalb, P. 2001. Plan-view trajectory estimation with dense stereo background models. In IEEE International Conference on Computer Vision, Vancouver, Canada.

  • Darrell, T., Gordon, G., Harville, M., and Woodfill, J. 1998. Integrated person tracking using stereo, color, and pattern detection.In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Santa Barbara, CA, pp. 601– 608.

  • Elgammal, A., Harwood, D., and Davis, L. 2000. Non-parametric model for background subtraction. In 6th European Conference on Computer Vision, Dublin, Ireland.

  • Faugeras, O.D. and Keriven, R. 1998. Complete dense stereovision using level set methods. In European Conference on Computer Vision.

  • Gavrila, D.M. and Davis, L.S. 1996. 3D model-based tracking of humans in action: A multi-view approach. In IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, CA, pp. 73–80.

  • Georgis, N., Petrou, M., and Kittler, J.V. 1995. Obtaining correspondences from 2D perspective views with wide angular separation of non-coplanar points. In Proceedings of the European-Chinese Workshop on Computer Vision, pp. 376–379.

  • Haritaoglu, I., Harwood, D., and Davis, L.S. 1998a. W4:Who, when, where, what: A real time system for detecting and tracking people.In Third IEEE International Conference on Automatic Face and Gesture Recognition, Nara, Japan, pp. 222–227.

  • Haritaoglu, I., Harwood, D., and Davis, L.S. 1998b. W4S: A real-time system for detecting and tracking people in 2 1/2D.In 5th European Conference on Computer Vision, Freiburg, Germany.

  • Haritaoglu, I., Harwood, D., and Davis, L.S. 1999. Hydra: Multiple people detection and tracking using silhouettes. In International Conference on Image Analysis and Processing, Venice, Italy, pp. 280–295.

  • Horaud, R. and Skordas, T. 1989. Stereo correspondence through feature grouping and maximal cliques. IEEE Journal on Pattern Analysis and Computer Vision, 11(11):1168–1180.

    Google Scholar 

  • Intille, S.S. and Bobick, A.F. 1995. Closed-world tracking. In 5th International Conference on Computer Vision, Cambridge, MA, pp. 672–678.

  • Intille, S.S., Davis, J.W., and Bobick, A.F. 1997. Real-time closedworld tracking. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 697–703.

  • Kettnaker, V. and Zabih, R. 1999a. Counting people from multiple cameras. In IEEE International Conference on Multimedia Computing and Systems, Florence, Italy, pp. 267–271.

  • Kettnaker, V. and Zahih, R. 1999b. Bayesian multi-camera surveillance.In IEEE Conference on ComputerVision andPattern Recognition, Fort Collins, CO, pp. 117–123.

  • Krumm, J., Harris, S., Meyers, B., Brumitt, B., Hale, M., and Shafer, S. 2000. Multi-camera multi-person tracking for easy living. In 3rd IEEE International Workshop on Visual Surveillance, Dublin, Ireland.

  • Kutulakos, K.N. and Seitz, S.M. 2000. A theory of shape by space carving. International Journal of Computer Vision, 38(3):199– 218.

    Google Scholar 

  • MacCormick, J. and Blake, A. 1999. A probabilistic exclusion principle for tracking multiple objects. In 7th International Conference on Computer Vision, Kerkyra, Greece, pp. 572–578.

  • Mittal, A. and Davis, L.S. 2001. Unified multi-camera detection and tracking using region-matching. In IEEE Workshop on Multi-Object Tracking, Vancouver, Canada.

  • Mittal, A. and Davis, L.S. 2002. M2Tracker: A multi-view approach to segmenting and tracking people in a cluttered scene using region-based stereo. In Seventh European Conference on Computer Vision, Copenhagen, Denmark.

  • Mittal, A. and Huttenlocher, D. 2000. Site modeling for wide area surveillance and image synthesis. In IEEE Conference on Computer Vision and Pattern Recognition, Hilton Head, SC.

  • Okutomi, M. and Kanade, T. 1993. A Multiple-baseline stereo. In IEEE Transactions on Pattern Recognition and Machine Intelligence, 15(4).

  • Orwell, J., Massey, S., Remagnino, P., Greenhill, D., and Jones, G.A. 1999b. A Multi-agent framework for visual surveillance. In International Conference on Image Analysis and Processing, Venice, Italy, pp. 1104–1107.

  • Orwell, J., Remagnino, P., and Jones, G.A. 1999a. Multi-camera color tracking. In Proceedings of the 2nd IEEEWorkshop onVisual Surveillance, Fort Collins, Colorado.

  • Pritchett, P. and Zisserman, A. 1998. Wide baseline stereo matching. In Sixth International Conference on Computer Vision, Bombay, India, pp. 754–760.

  • Rosales, R. and Sclaroff, S. 1999. 3D trajectory recovery for tracking multiple objects and trajectory guided recognition of actions. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Fort Collins, Colorado, pp. 117–123.

  • Scottt, D.W. 1992. Multivariate Density Estimation. Wiley-Interscience: New York.

    Google Scholar 

  • Snow, D., Viola, P., and Zabih, R. 2000. Exact voxel occupancy using graph cuts. In IEEE Conference on Computer Vision and Pattern Recognition, Hilton Head, SC.

  • Swain, M.J. and Ballard, D.H. 1991. Color indexing. International Journal of Computer Vision, 7:11–32.

    Google Scholar 

  • Want, R., Hopper, A., Falcao, V., and Gibbons, J. 1992. The active badge location system. ACM Transactions on Information Systems, 10(1):91–102.

    Google Scholar 

  • Wren, C.R., Azarbayejani, A., Darrell, T., and Pentland, A.P. 1997. Pfinder: Real-time tracking of the human body. IEEE Transactions on Pattern Recognition and Machine Intelligence, 19(7).

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mittal, A., Davis, L.S. M2Tracker: A Multi-View Approach to Segmenting and Tracking People in a Cluttered Scene. International Journal of Computer Vision 51, 189–203 (2003). https://doi.org/10.1023/A:1021849801764

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1021849801764

Navigation