Skip to main content
Log in

Multi-view Occlusion Reasoning for Probabilistic Silhouette-Based Dynamic Scene Reconstruction

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

In this paper, we present an algorithm to probabilistically estimate object shapes in a 3D dynamic scene using their silhouette information derived from multiple geometrically calibrated video camcorders. The scene is represented by a 3D volume. Every object in the scene is associated with a distinctive label to represent its existence at every voxel location. The label links together automatically-learned view-specific appearance models of the respective object, so as to avoid the photometric calibration of the cameras. Generative probabilistic sensor models can be derived by analyzing the dependencies between the sensor observations and object labels. Bayesian reasoning is then applied to achieve robust reconstruction against real-world environment challenges, such as lighting variations, changing background etc. Our main contribution is to explicitly model the visual occlusion process and show: (1) static objects (such as trees or lamp posts), as parts of the pre-learned background model, can be automatically recovered as a byproduct of the inference; (2) ambiguities due to inter-occlusion between multiple dynamic objects can be alleviated, and the final reconstruction quality is drastically improved. Several indoor and outdoor real-world datasets are evaluated to verify our framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Apostoloff, N., & Fitzgibbon, A. (2005). Learning spatiotemporal T-junctions for occlusion detection. In CVPR (Vol. 2, pp. 553–559).

  • Baumgart, B. G. (1974). Geometric modeling for computer vision. PhD thesis, 1974.

  • De Bonet, J. S., & Viola, P. (1999). Roxels: responsibility weighted 3d volume reconstruction. In ICCV (Vol. 1, pp. 418–425).

  • Broadhurst, A., Drummond, T., & Cipolla, R. (2001). A probabilistic framework for the space carving algorithm. In ICCV (Vol. 1, pp. 388–393).

  • Brostow, G., & Essa, I. (1999). Motion based decompositing of video. In ICCV (Vol. 1, pp. 8–13).

  • Elfes, A. (1989). Using occupancy grids for mobile robot perception and navigation. IEEE Computer, 22(6), 46–57. Special issue on autonomous intelligent machines.

    Google Scholar 

  • Elgammal, A., Duraiswami, R., Harwood, D., & Davis, L. (2002). Background and foreground modeling using nonparametric kernel density estimation for visual surveillance. Proceedings of the IEEE, 90(7), 1151–1163.

    Article  Google Scholar 

  • Favaro, P., Duci, A., Ma, Y., & Soatto, S. (2003). On exploiting occlusions in multiple-view geometry. In ICCV (Vol. 1, pp. 479–486).

  • Fleuret, F., Berclaz, J., Lengagne, R., & Fua, P. (2007). Multi-camera people tracking with a probabilistic occupancy map. IEEE Transactions on Patern Analysis and Machine Intelligence, 30(2), 267–282.

    Article  Google Scholar 

  • Franco, J.-S., & Boyer, E. (2003). Exact polyhedral visual hulls. In BMVC (Vol. 1, pp. 329–338).

  • Franco, J.-S., & Boyer, E. (2005). Fusion of multi-view silhouette cues using a space occupancy grid. In ICCV (Vol. 2, pp. 1747–1753).

  • Furukawa, Y., & Ponce, J. (2006). Carved visual hulls for image-based modeling. In ECCV (Vol. 1, pp. 564–577).

  • Grauman, K., Shakhnarovich, G., & Darrell, T. (2003). A Bayesian approach to image-based visual hull reconstruction. In CVPR (Vol. 1, pp. 187–194).

  • Guan, L., Sinha, S., Franco, J.-S., & Pollefeys, M. (2006). Visual hull construction in the presence of partial occlusion. In 3DPVT (Vol. 1, pp. 413–420).

  • Guan, L., Franco, J.-S., & Pollefeys, M. (2007). 3D occlusion inference from silhouette cues. In CVPR (pp. 1–8).

  • Guan, L., Franco, J.-S., & Pollefeys, M. (2008). Multi-object shape estimation and tracking from silhouette cues. In CVPR (pp. 1–8).

  • Gupta, A., Mittal, A., & Davis, L. S. (2007). Cost: an approach for camera selection and multi-object inference ordering in dynamic scenes. In ICCV (pp. 1–8).

  • Hoiem, D., Stein, A., Efros, A., & Hebert, M. (2007). Recovering occlusion boundaries from a single image. In ICCV (pp. 1–8).

  • Ilie, A., & Welsh, G. (2005). Ensuring color consistency across multiple cameras. In ICCV (Vol. 2, pp. 1268–1275).

  • Joshi, N., Wilburn, B., Vaish, V., Levoy, M., & Horowitz, M. (2005). Automatic color calibration for large camera arrays (UCSD CSE Tech Report CS2005-0821).

  • Keck, M., & Davis, J. (2008). 3D occlusion recovery using few cameras. In CVPR (pp. 1–8).

  • Kim, K., Harwood, D., & Davis, L. (2005). Background updating for visual surveillance. In ISVC (Vol. 1, pp. 337–346).

  • Kutulakos, K., & Seitz, S. (2000). A theory of shape by space carving. International Journal of Computer Vision, 38(3), 199–218.

    Article  MATH  Google Scholar 

  • Laurentini, A. (1994). The visual hull concept for silhouette-based image understanding. IEEE Transactions on Patern Analysis and Machine Intelligence, 16(2), 150–162.

    Article  Google Scholar 

  • Lazebnik, S., Boyer, E., & Ponce, J. (2001). On computing exact visual hulls of solids bounded by smooth surfaces. In CVPR (Vol. 1, pp. 156–161).

  • Margaritis, D., & Thrun, S. (1998). Learning to locate an object in 3d space from a sequence of camera images. In ICML (Vol. 1, pp. 332–340).

  • Matusik, W., Buehler, C., Raskar, R., Gortler, S., & McMillan, L. (2000). Image-based visual hulls. In Siggraph (Vol. 1, pp. 369–374).

  • Matusik, W., Buehler, C., & Mcmillan, L. (2001). Polyhedral visual hulls for real-time rendering. In Proceedings of eurographics workshop on rendering (Vol. 1, pp. 115–126).

  • Mittal, A., & Davis, L. S. (2003). M2tracker: a multi-view approach to segmenting and tracking people in a cluttered scene. International Journal of Computer Vision, 51(3), 189–203.

    Article  Google Scholar 

  • Otsuka, K., & Mukawa, N. (2004). Multiview occlusion analysis for tracking densely populated objects based on 2-D visual angles. In CVPR (Vol. 1, pp. 90–97).

  • Scharstein, D., & Szeliski, R. (2002). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision, 47, 7–42.

    Article  MATH  Google Scholar 

  • Seitz, S., Curless, B., Diebel, J., Scharstein, D., & Szeliski, R. (2006). A comparison and evaluation of multi-view stereo reconstruction algorithms. In CVPR (Vol. 1, pp. 519–528).

  • Sinha, S., & Pollefeys, M. (2005). Multi-view reconstruction using photo-consistency and exact silhouette constraints: a maximum-flow formulation. In ICCV (Vol. 1, pp. 349–356).

  • Slabaugh, G., Culbertson, B. W., Malzbender, T., Stevens, M. R., & Schafer, R. (2004). Methods for volumetric reconstruction of visual scenes. International Journal of Computer Vision, 57, 179–199.

    Article  Google Scholar 

  • Snow, D., Viola, P., & Zabih, R. (2000). Exact voxel occupancy with graph cuts. In CVPR (Vol. 1, pp. 345–353).

  • Stauffer, C., & Grimson, W. E. L. (1999). Adaptive background mixture models for real-time tracking. In CVPR (Vol. 2, pp. 246–252).

  • Takamatsu, J., Matsushita, Y., & Ikeuchi, K. (2008). Estimating camera response functions using probabilistic intensity similarity. In CVPR (pp. 1–8).

  • Yang, D., Gonzalez-Banos, H., & Guibas, L. (2003). Counting people in crowds with a real-time network of simple image sensors. In ICCV (Vol. 1, pp. 122–129).

  • Ziegler, R., Matusik, W., Pfister, H., & McMillan, L. (2003). 3D reconstruction using labeled image regions. In EG symposium on geometry processing (Vol. 1, pp. 248–259).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Guan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guan, L., Franco, JS. & Pollefeys, M. Multi-view Occlusion Reasoning for Probabilistic Silhouette-Based Dynamic Scene Reconstruction. Int J Comput Vis 90, 283–303 (2010). https://doi.org/10.1007/s11263-010-0341-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-010-0341-y

Keywords

Navigation