Skip to main content
Log in

Putting the User in the Loop for Image-Based Modeling

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

We refer to the task of recovering the 3D structure of an object or a scene using 2D images as image-based modeling. In this paper, we formulate the task of recovering the 3D structure as a discrete optimization problem solved via energy minimization. In this standard framework of a Markov random field (MRF) defined over the image we present algorithms that allow the user to intuitively interact with the algorithm. We introduce an algorithm where the user guides the process of image-based modeling to find and model the object of interest by manually interacting with the nodes of the graph. We develop end user applications using this algorithm that allow object of interest 3D modeling on a mobile device and 3D printing of the object of interest. We also propose an alternate active learning algorithm that guides the user input. An initial attempt is made at reconstructing the scene without supervision. Given the reconstruction, an active learning algorithm uses intuitive cues to quantify the uncertainty of the algorithm and suggest regions, querying the user to provide support for the uncertain regions via simple scribbles. These constraints are used to update the unary and the pairwise energies that, when solved, lead to better reconstructions. We show through machine experiments and a user study that the proposed approach intelligently queries the users for constraints, and users achieve better reconstructions of the scene faster, especially for scenes with textureless surfaces lacking strong textural or structural cues that algorithms typically require.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23

Similar content being viewed by others

Notes

  1. Superpixels are used to help reduce computational complexity.

  2. We use mean-shift segmentation (Comaniciu and Meer 2002) to break an image to about thousand superpixels.

  3. The parameter \(\lambda \) is set to 0.5.

  4. http://chenlab.ece.cornell.edu/projects/Interactive_3D.

  5. We use graph based segmentation (Felzenszwalb and Huttenlocher 2004) to break each image down to about 400 superpixels.

  6. http://chenlab.ece.cornell.edu/projects/ActiveLearningFor3D.

  7. http://chenlab.ece.cornell.edu/projects/ActiveLearningFor3D.

  8. http://chenlab.ece.cornell.edu/projects/iModel.

  9. The 3D printouts were obtained using the online service http://www.shapeways.com.

References

  • Bagon, S. (2006). Matlab wrapper for graph cut. http://www.wisdom.weizmann.ac.il/bagon. Accessed 7 March 2013.

  • Bartoli, A. (2007). A random sampling strategy for piecewise planar scene segmentation. Cardiac and Vascular Institute of Ultrasound, 105(1), 42–59.

    Google Scholar 

  • Batra, D., Kowdle, A., Parikh, D., Luo, J., & Chen, T. (2011). Interactively co-segmenting topically related images with intelligent scribble guidance. International Journal of Computer Vision, 93(3), 273–292.

    Article  Google Scholar 

  • Baumgart, B.G. (1974). Geometric modeling for computer vision. PhD thesis, Stanford University.

  • Boykov, Y., & Kolmogorov, V. (2004). An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. Pattern Analysis and Machine Intelligence, 26(9), 1124–1137.

    Article  Google Scholar 

  • Boykov, Y., Veksler, O., & Zabih, R. (2001). Efficient approximate energy minimization via graph cuts. Pattern Analysis and Machine Intelligence, 20(12), 1222–1239.

    Article  Google Scholar 

  • Campbell, N., Vogiatzis, G., Hernndez, C., & Cipolla, R. (2007). Automatic 3d object segmentation in multiple views using volumetric graph-cuts. In BMVC, Bristol.

  • Campbell, N.D., Vogiatzis, G., Hernández, C., & Cipolla, R. (2008). Using multiple hypotheses to improve depth-maps for multi-view stereo. In ECCV.

  • Chen, Z., Chou, H.L., & Chen, W.C. (2008). A performance controllable octree construction method. In ICPR.

  • Collins, B., Deng, J., Li, K., & Fei-Fei, L. (2008). Towards scalable dataset construction: An active learning approach. In ECCV.

  • Comaniciu, D., & Meer, P. (2002). Mean shift: a robust approach toward feature space analysis. Pattern Analysis and Machine Intelligence, 24(5), 603–619.

    Article  Google Scholar 

  • Criminisi, A., Reid, I.D., & Zisserman, A. (1999). Single view metrology. In ICCV.

  • Debevec, P., Taylor, C., & Malik, J. (1996). Modeling and rendering architecture from photographs: A hybrid geometry- and image-based approach. In SIGGRAPH.

  • Fang, Y. H., Chou, H. L., & Chen, Z. (2003). 3D Shape recovery of complex objects from multiple silhouette images. Pattern Recognition Letters, 24(9–10), 1279–1293.

    Article  MATH  Google Scholar 

  • Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. International Journal of Computer Vision, 59(2), 167–181.

    Article  Google Scholar 

  • Forbes, K., Nicolls, F., de Jager, G., & Voigt, A. (2006). Shape-from-silhouette with two mirrors and an uncalibrated camera. In ECCV, (pp. 165–178).

  • Furukawa, Y., & Ponce, J. (2009). Accurate, dense, and robust multi-view stereopsis. Pattern Analysis and Machine Intelligence, 32:1362–1376.

    Google Scholar 

  • Furukawa, Y., Curless, B., Seitz, S., & Szeliski, R. (2009). Reconstructing building interiors from images. In ICCV.

  • Furukawa, Y., Curless, B., Seitz, S.M., & Szeliski, R. (2010). Towards internet-scale multi-view stereo. In CVPR.

  • Gallup, D., Frahm, J., & Pollefeys, M. (2010). Piecewise planar and non-planar stereo for urban scene reconstruction. In CVPR.

  • Goesele, M., Snavely, N., Curless, B., Hoppe, H., & Seitz, S.M. (2007). Multi-view stereo for community photo collections. In ICCV.

  • Gosselin, P. H., & Cord, M. (2008). Active learning methods for interactive image retrieval. IEEE Transactions on Image Processing, 17(7), 1200–1211.

    Article  MathSciNet  Google Scholar 

  • Hengel, A., Dick, A. R., ThormŁhlen, T., Ward, B., & Torr, P. H. S. (2007). Videotrace: Rapid interactive scene modelling from video. ACM Transactions on Graphics, 26(3), 86.

    Article  Google Scholar 

  • Hoiem, D., Efros, A., & Hebert, M. (2005). Automatic photo pop-up. In SIGGRAPH.

  • Hoiem, D., Efros, A. A., & Hebert, M. (2007). Recovering surface layout from an image. IJCV, 75(1)

  • Jain, P., & Kapoor, A. (2009). Active learning for large multi-class problems. In CVPR, (pp. 762–769).

  • Kapoor, A., Grauman, K., Urtasun, R., & Darrell, T. (2007). Active learning with gaussian processes for object categorization. In ICCV.

  • Kohli, P., & Torr, P. H. S. (2008). Measuring uncertainty in graph cut solutions. Computer Vision and Image Understanding, 112(1), 30–38.

    Article  Google Scholar 

  • Kohli, P., Nickisch, H., Rother, C., & Rhemann, C. (2012). User-centric learning and evaluation of interactive segmentation systems. In IJCV.

  • Kolmogorov, V., & Zabih, R. (2004). What energy functions can be minimized via graph cuts? Pattern Analysis and Machine Intelligence, 26(2), 147–159.

    Article  Google Scholar 

  • Kowdle, A., Batra, D., Chen, W., & Chen, T. (2010). iModel: Interactive co-segmentation for object of interest 3d modeling. In ECCVRMLE Workshop.

  • Kowdle, A., Chang, Y., Batra, D., & Chen, T. (2011a). Scribble based interactive 3d reconstruction via scene cosegmentation. In ICIP.

  • Kowdle, A., Chang, Y., Gallagher, A., & Chen, T. (2011b). Active learning for piecewise planar 3d reconstruction. In CVPR.

  • Kowdle, A., Liu, H., Hsu, S., Lew, J., Puri, C., Batra, D., & Chen, T. (2012a). iModel: Object of interest 3d modeling via interactive co-segmentation on a mobile device. In Demo session at CVPR.

  • Kowdle, A., Sinha, S., & Szeliski, R. (2012b). Multiple view object cosegmentation using appearance and stereo cues. In ECCV.

  • Lafarge, F., Keriven, R., Brédif, M., & Hiep, V. (2010). Hybrid multi-view reconstruction by jump-diffusion. In CVPR.

  • Lee, W., Woo, W., & Boyer, E. (2007). Identifying foreground from multiple images. In ACCV.

  • McGuinness, K., & O’Connor, N.E. (2012). Toward automated evaluation of interactive segmentation. In Computer Vision and Image Understanding. 115(6) (pp. 868-884).

  • Micusík, B., & Kosecká, J. (2010). Multi-view superpixel stereo in urban environments. International Journal of Computer Vision, 89(1), 106–119.

    Article  Google Scholar 

  • Pollefeys, M., Van Gool, L., Vergauwen, M., Verbiest, F., Cornelis, K., Tops, J., et al. (2004). Visual modeling with a hand-held camera. International Journal of Computer Vision, 59(3), 207–232.

    Google Scholar 

  • Pollefeys, M., Nistr, D., Frahm, J., Akbarzadeh, A., Mordohai, P., Clipp, B., et al. (2008). Detailed real-time urban 3d reconstruction from video. International Journal of Computer Vision, 78(2–3), 143–167.

    Google Scholar 

  • Saxena, A., Sun, M., & Ng, A. Y. (2009). Make3d: Learning 3d scene structure from a single still image. Pattern Analysis and Machine Intelligence, 31(5), 824–840.

    Google Scholar 

  • Scharstein, D., & Szeliski, R. (2002). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision, 47(1–3), 7–42.

    Google Scholar 

  • Seitz, S.M., Curless, B., Diebel, J., Scharstein, D., & Szeliski, R. (2006). A comparison and evaluation of multi-view stereo reconstruction algorithms. In CVPR.

  • Sinha, S., Steedly, D., Szeliski, R., Agrawala, M., & Pollefeys, M. (2008). Interactive 3d architectural modeling from unordered photo collections. In SIGGRAPH Asia.

  • Sinha, S., Steedly, D., & Szeliski, R. (2009). Piecewise planar stereo for image-based rendering. In ICCV.

  • Sketchup. (2000). Google sketchup. http://sketchup.google.com/. Accessed 7 March 2013.

  • Snavely, N., Seitz, S., & Szeliski, R. (2006). Photo tourism: Exploring photo collections in 3d. In SIGGRAPH.

  • Srivastava, S., Saxena, A., Theobalt, C., Thrun, S., & Ng, A.Y. (2009). i23 - Rapid interactive 3d reconstruction from a single image. In Vision, Modeling and Visualization.

  • Sturm, P.F., & Maybank, S.J. (1999). A method for interactive 3d reconstruction of piecewise planar objects from single images. In BMVC.

  • Szeliski, R. (1993). Rapid octree construction from image sequences. Computer Vision Graphics and Image Processing, 58(1), 23–32.

    Article  Google Scholar 

  • Tang, K., Kowdle, A., Batra, D., & Chen, T. (2009). iScribble. http://chenlab.ece.cornell.edu/projects/iScribble/iScribble.html. Accessed 7 March 2013.

  • Vicente, S., Rother, C., & Kolmogorov, V. (2011). Object cosegmentation. In CVPR.

  • Vijayanarasimhan, S., Jain, P., & Grauman, K. (2010). Far-sighted active learning on a budget for image and video recognition. In CVPR.

  • Yan, R., Yang, J., & Hauptmann, A. (2003). Automatically labeling video data using multi-class active learning. In ICCV.

  • Zhou, X. S., & Huang, T. S. (2003). Relevance feedback in image retrieval: A comprehensive review. Multimedia Systems, 8(6), 536–544.

    Article  Google Scholar 

Download references

Acknowledgments

The authors thank Anandram Sundar for the data annotation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adarsh Kowdle.

Additional information

Communicated by Dr. S.J. Belongie and Dr. Kristen Grauman.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kowdle, A., Chang, YJ., Gallagher, A. et al. Putting the User in the Loop for Image-Based Modeling. Int J Comput Vis 108, 30–48 (2014). https://doi.org/10.1007/s11263-014-0704-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-014-0704-x

Keywords

Navigation