Putting the User in the Loop for Image-Based Modeling

Kowdle, Adarsh; Chang, Yao-Jen; Gallagher, Andrew; Batra, Dhruv; Chen, Tsuhan

doi:10.1007/s11263-014-0704-x

Putting the User in the Loop for Image-Based Modeling

Published: 12 March 2014

Volume 108, pages 30–48, (2014)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Adarsh Kowdle¹,
Yao-Jen Chang²,
Andrew Gallagher¹,
Dhruv Batra³ &
…
Tsuhan Chen¹

Abstract

We refer to the task of recovering the 3D structure of an object or a scene using 2D images as image-based modeling. In this paper, we formulate the task of recovering the 3D structure as a discrete optimization problem solved via energy minimization. In this standard framework of a Markov random field (MRF) defined over the image we present algorithms that allow the user to intuitively interact with the algorithm. We introduce an algorithm where the user guides the process of image-based modeling to find and model the object of interest by manually interacting with the nodes of the graph. We develop end user applications using this algorithm that allow object of interest 3D modeling on a mobile device and 3D printing of the object of interest. We also propose an alternate active learning algorithm that guides the user input. An initial attempt is made at reconstructing the scene without supervision. Given the reconstruction, an active learning algorithm uses intuitive cues to quantify the uncertainty of the algorithm and suggest regions, querying the user to provide support for the uncertain regions via simple scribbles. These constraints are used to update the unary and the pairwise energies that, when solved, lead to better reconstructions. We show through machine experiments and a user study that the proposed approach intelligently queries the users for constraints, and users achieve better reconstructions of the scene faster, especially for scenes with textureless surfaces lacking strong textural or structural cues that algorithms typically require.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

Superpixels are used to help reduce computational complexity.
We use mean-shift segmentation (Comaniciu and Meer 2002) to break an image to about thousand superpixels.
The parameter \(\lambda \) is set to 0.5.
http://chenlab.ece.cornell.edu/projects/Interactive_3D.
We use graph based segmentation (Felzenszwalb and Huttenlocher 2004) to break each image down to about 400 superpixels.
http://chenlab.ece.cornell.edu/projects/ActiveLearningFor3D.
http://chenlab.ece.cornell.edu/projects/ActiveLearningFor3D.
http://chenlab.ece.cornell.edu/projects/iModel.
The 3D printouts were obtained using the online service http://www.shapeways.com.

References

Bagon, S. (2006). Matlab wrapper for graph cut. http://www.wisdom.weizmann.ac.il/bagon. Accessed 7 March 2013.
Bartoli, A. (2007). A random sampling strategy for piecewise planar scene segmentation. Cardiac and Vascular Institute of Ultrasound, 105(1), 42–59.
Google Scholar
Batra, D., Kowdle, A., Parikh, D., Luo, J., & Chen, T. (2011). Interactively co-segmenting topically related images with intelligent scribble guidance. International Journal of Computer Vision, 93(3), 273–292.
Article Google Scholar
Baumgart, B.G. (1974). Geometric modeling for computer vision. PhD thesis, Stanford University.
Boykov, Y., & Kolmogorov, V. (2004). An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. Pattern Analysis and Machine Intelligence, 26(9), 1124–1137.
Article Google Scholar
Boykov, Y., Veksler, O., & Zabih, R. (2001). Efficient approximate energy minimization via graph cuts. Pattern Analysis and Machine Intelligence, 20(12), 1222–1239.
Article Google Scholar
Campbell, N., Vogiatzis, G., Hernndez, C., & Cipolla, R. (2007). Automatic 3d object segmentation in multiple views using volumetric graph-cuts. In BMVC, Bristol.
Campbell, N.D., Vogiatzis, G., Hernández, C., & Cipolla, R. (2008). Using multiple hypotheses to improve depth-maps for multi-view stereo. In ECCV.
Chen, Z., Chou, H.L., & Chen, W.C. (2008). A performance controllable octree construction method. In ICPR.
Collins, B., Deng, J., Li, K., & Fei-Fei, L. (2008). Towards scalable dataset construction: An active learning approach. In ECCV.
Comaniciu, D., & Meer, P. (2002). Mean shift: a robust approach toward feature space analysis. Pattern Analysis and Machine Intelligence, 24(5), 603–619.
Article Google Scholar
Criminisi, A., Reid, I.D., & Zisserman, A. (1999). Single view metrology. In ICCV.
Debevec, P., Taylor, C., & Malik, J. (1996). Modeling and rendering architecture from photographs: A hybrid geometry- and image-based approach. In SIGGRAPH.
Fang, Y. H., Chou, H. L., & Chen, Z. (2003). 3D Shape recovery of complex objects from multiple silhouette images. Pattern Recognition Letters, 24(9–10), 1279–1293.
Article MATH Google Scholar
Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. International Journal of Computer Vision, 59(2), 167–181.
Article Google Scholar
Forbes, K., Nicolls, F., de Jager, G., & Voigt, A. (2006). Shape-from-silhouette with two mirrors and an uncalibrated camera. In ECCV, (pp. 165–178).
Furukawa, Y., & Ponce, J. (2009). Accurate, dense, and robust multi-view stereopsis. Pattern Analysis and Machine Intelligence, 32:1362–1376.
Google Scholar
Furukawa, Y., Curless, B., Seitz, S., & Szeliski, R. (2009). Reconstructing building interiors from images. In ICCV.
Furukawa, Y., Curless, B., Seitz, S.M., & Szeliski, R. (2010). Towards internet-scale multi-view stereo. In CVPR.
Gallup, D., Frahm, J., & Pollefeys, M. (2010). Piecewise planar and non-planar stereo for urban scene reconstruction. In CVPR.
Goesele, M., Snavely, N., Curless, B., Hoppe, H., & Seitz, S.M. (2007). Multi-view stereo for community photo collections. In ICCV.
Gosselin, P. H., & Cord, M. (2008). Active learning methods for interactive image retrieval. IEEE Transactions on Image Processing, 17(7), 1200–1211.
Article MathSciNet Google Scholar
Hengel, A., Dick, A. R., ThormŁhlen, T., Ward, B., & Torr, P. H. S. (2007). Videotrace: Rapid interactive scene modelling from video. ACM Transactions on Graphics, 26(3), 86.
Article Google Scholar
Hoiem, D., Efros, A., & Hebert, M. (2005). Automatic photo pop-up. In SIGGRAPH.
Hoiem, D., Efros, A. A., & Hebert, M. (2007). Recovering surface layout from an image. IJCV, 75(1)
Jain, P., & Kapoor, A. (2009). Active learning for large multi-class problems. In CVPR, (pp. 762–769).
Kapoor, A., Grauman, K., Urtasun, R., & Darrell, T. (2007). Active learning with gaussian processes for object categorization. In ICCV.
Kohli, P., & Torr, P. H. S. (2008). Measuring uncertainty in graph cut solutions. Computer Vision and Image Understanding, 112(1), 30–38.
Article Google Scholar
Kohli, P., Nickisch, H., Rother, C., & Rhemann, C. (2012). User-centric learning and evaluation of interactive segmentation systems. In IJCV.
Kolmogorov, V., & Zabih, R. (2004). What energy functions can be minimized via graph cuts? Pattern Analysis and Machine Intelligence, 26(2), 147–159.
Article Google Scholar
Kowdle, A., Batra, D., Chen, W., & Chen, T. (2010). iModel: Interactive co-segmentation for object of interest 3d modeling. In ECCV – RMLE Workshop.
Kowdle, A., Chang, Y., Batra, D., & Chen, T. (2011a). Scribble based interactive 3d reconstruction via scene cosegmentation. In ICIP.
Kowdle, A., Chang, Y., Gallagher, A., & Chen, T. (2011b). Active learning for piecewise planar 3d reconstruction. In CVPR.
Kowdle, A., Liu, H., Hsu, S., Lew, J., Puri, C., Batra, D., & Chen, T. (2012a). iModel: Object of interest 3d modeling via interactive co-segmentation on a mobile device. In Demo session at CVPR.
Kowdle, A., Sinha, S., & Szeliski, R. (2012b). Multiple view object cosegmentation using appearance and stereo cues. In ECCV.
Lafarge, F., Keriven, R., Brédif, M., & Hiep, V. (2010). Hybrid multi-view reconstruction by jump-diffusion. In CVPR.
Lee, W., Woo, W., & Boyer, E. (2007). Identifying foreground from multiple images. In ACCV.
McGuinness, K., & O’Connor, N.E. (2012). Toward automated evaluation of interactive segmentation. In Computer Vision and Image Understanding. 115(6) (pp. 868-884).
Micusík, B., & Kosecká, J. (2010). Multi-view superpixel stereo in urban environments. International Journal of Computer Vision, 89(1), 106–119.
Article Google Scholar
Pollefeys, M., Van Gool, L., Vergauwen, M., Verbiest, F., Cornelis, K., Tops, J., et al. (2004). Visual modeling with a hand-held camera. International Journal of Computer Vision, 59(3), 207–232.
Google Scholar
Pollefeys, M., Nistr, D., Frahm, J., Akbarzadeh, A., Mordohai, P., Clipp, B., et al. (2008). Detailed real-time urban 3d reconstruction from video. International Journal of Computer Vision, 78(2–3), 143–167.
Google Scholar
Saxena, A., Sun, M., & Ng, A. Y. (2009). Make3d: Learning 3d scene structure from a single still image. Pattern Analysis and Machine Intelligence, 31(5), 824–840.
Google Scholar
Scharstein, D., & Szeliski, R. (2002). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision, 47(1–3), 7–42.
Google Scholar
Seitz, S.M., Curless, B., Diebel, J., Scharstein, D., & Szeliski, R. (2006). A comparison and evaluation of multi-view stereo reconstruction algorithms. In CVPR.
Sinha, S., Steedly, D., Szeliski, R., Agrawala, M., & Pollefeys, M. (2008). Interactive 3d architectural modeling from unordered photo collections. In SIGGRAPH Asia.
Sinha, S., Steedly, D., & Szeliski, R. (2009). Piecewise planar stereo for image-based rendering. In ICCV.
Sketchup. (2000). Google sketchup. http://sketchup.google.com/. Accessed 7 March 2013.
Snavely, N., Seitz, S., & Szeliski, R. (2006). Photo tourism: Exploring photo collections in 3d. In SIGGRAPH.
Srivastava, S., Saxena, A., Theobalt, C., Thrun, S., & Ng, A.Y. (2009). i23 - Rapid interactive 3d reconstruction from a single image. In Vision, Modeling and Visualization.
Sturm, P.F., & Maybank, S.J. (1999). A method for interactive 3d reconstruction of piecewise planar objects from single images. In BMVC.
Szeliski, R. (1993). Rapid octree construction from image sequences. Computer Vision Graphics and Image Processing, 58(1), 23–32.
Article Google Scholar
Tang, K., Kowdle, A., Batra, D., & Chen, T. (2009). iScribble. http://chenlab.ece.cornell.edu/projects/iScribble/iScribble.html. Accessed 7 March 2013.
Vicente, S., Rother, C., & Kolmogorov, V. (2011). Object cosegmentation. In CVPR.
Vijayanarasimhan, S., Jain, P., & Grauman, K. (2010). Far-sighted active learning on a budget for image and video recognition. In CVPR.
Yan, R., Yang, J., & Hauptmann, A. (2003). Automatically labeling video data using multi-class active learning. In ICCV.
Zhou, X. S., & Huang, T. S. (2003). Relevance feedback in image retrieval: A comprehensive review. Multimedia Systems, 8(6), 536–544.
Article Google Scholar

Download references

Acknowledgments

The authors thank Anandram Sundar for the data annotation.

Author information

Authors and Affiliations

Cornell University, Ithaca, NY, USA
Adarsh Kowdle, Andrew Gallagher & Tsuhan Chen
Siemens Corporation, Corporate Technology, Princeton, NJ, USA
Yao-Jen Chang
Virginia Tech, Blacksburg, VA, USA
Dhruv Batra

Authors

Adarsh Kowdle
View author publications
You can also search for this author in PubMed Google Scholar
Yao-Jen Chang
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Gallagher
View author publications
You can also search for this author in PubMed Google Scholar
Dhruv Batra
View author publications
You can also search for this author in PubMed Google Scholar
Tsuhan Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Adarsh Kowdle.

Additional information

Communicated by Dr. S.J. Belongie and Dr. Kristen Grauman.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kowdle, A., Chang, YJ., Gallagher, A. et al. Putting the User in the Loop for Image-Based Modeling. Int J Comput Vis 108, 30–48 (2014). https://doi.org/10.1007/s11263-014-0704-x

Download citation

Received: 07 March 2013
Accepted: 14 February 2014
Published: 12 March 2014
Issue Date: May 2014
DOI: https://doi.org/10.1007/s11263-014-0704-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Putting the User in the Loop for Image-Based Modeling

Abstract

Access this article

Similar content being viewed by others

Modeling Pose/Appearance Relations for Improved Object Localization and Pose Estimation in 2D images

3DNN: 3D Nearest Neighbor

A robust hybrid image-based modeling system

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Putting the User in the Loop for Image-Based Modeling

Abstract

Access this article

Similar content being viewed by others

Modeling Pose/Appearance Relations for Improved Object Localization and Pose Estimation in 2D images

3DNN: 3D Nearest Neighbor

A robust hybrid image-based modeling system

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation