Skip to main content

Unsupervised Intuitive Physics from Visual Observations

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11363))

Abstract

While learning models of intuitive physics is an active area of research, current approaches fall short of natural intelligences in one important regard: they require external supervision, such as explicit access to physical states, at training and sometimes even at test time. Some approaches sidestep these requirements by building models on top of handcrafted physical simulators. In both cases, however, methods cannot learn automatically new physical environments and their laws as humans do. In this work, we successfully demonstrate, for the first time, learning unsupervised predictors of physical states, such as the position of objects in an environment, directly from raw visual observations and without relying on simulators. We do so in two steps: (i) we learn to track dynamically-salient objects in videos using causality and equivariance, two non-generative unsupervised learning principles that do not require manual or external supervision. (ii) we demonstrate that the extracted positions are sufficient to successfully train visual motion predictors that can take the underlying environment into account. We validate our predictors on synthetic datasets; then, we introduce a new dataset, Roll4Real, consisting of real objects rolling on complex terrains (pool table, elliptical bowl, and random height-field). We show that it is possible to learn reliable object trajectory extrapolators from raw videos alone, without any external supervision and with no more prior knowledge than the choice of a convolutional neural network architecture.

S. Ehrhardt and A. Monszpart—Contributed equally.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). tensorflow.org

  2. Agrawal, P., et al.: Learning to poke by poking: experiential learning of intuitive physics. In: Proceedings of NIPS, pp. 5074–5082 (2016)

    Google Scholar 

  3. Battaglia, P., et al.: Interaction networks for learning about objects, relations and physics. In: Proceedings of NIPS, pp. 4502–4510 (2016)

    Google Scholar 

  4. Battaglia, P., Hamrick, J., Tenenbaum, J.: Simulation as an engine of physical scene understanding. PNAS 110(45), 18327–18332 (2013)

    Article  Google Scholar 

  5. Bhattacharyya, A., et al.: Long-term image boundary prediction. In: Thirty-Second AAAI Conference on Artificial Intelligence. AAAI (2018)

    Google Scholar 

  6. Bradski, G.: The OpenCV library. Dr. Dobb’s J. Softw. Tools 120, 122–125 (2000)

    Google Scholar 

  7. Chang, M.B., et al.: A compositional object-based approach to learning physical dynamics. In: Proceedings of ICLR (2017)

    Google Scholar 

  8. Chiappa, S., et al.: Recurrent environment simulators (2017)

    Google Scholar 

  9. Denil, M., et al.: Learning to perform physics experiments via deep reinforcement learning. In: Deep Reinforcement Learning Workshop, NIPS (2016)

    Google Scholar 

  10. Ehrhardt, S., et al.: Learning A Physical Long-term Predictor. arXiv e-prints arXiv:1703.00247, March 2017

  11. Ehrhardt, S., et al.: Learning to Represent Mechanics via Long-term Extrapolation and Interpolation. arXiv preprint arXiv:1706.02179, June 2017

  12. Eslami, S.A., et al.: Attend, infer, repeat: fast scene understanding with generative models. In: Advances in Neural Information Processing Systems, pp. 3225–3233 (2016)

    Google Scholar 

  13. Finn, C., et al.: Deep spatial autoencoders for visuomotor learning. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 512–519. IEEE (2016)

    Google Scholar 

  14. Fragkiadaki, K., et al.: Learning visual predictive models of physics for playing billiards. In: Proceedings of NIPS (2016)

    Google Scholar 

  15. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)

    Google Scholar 

  16. Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: Flownet 2.0: evolution of optical flow estimation with deep networks (2017)

    Google Scholar 

  17. Kansky, K., et al.: Schema networks: zero-shot transfer with a generative causal model of intuitive physics. In: International Conference on Machine Learning, pp. 1809–1818 (2017)

    Google Scholar 

  18. Ladický, L., et al.: Data-driven fluid simulations using regression forests. ACM Trans. Graph. (TOG) 34(6), 199 (2015)

    Article  Google Scholar 

  19. Lee, A.X., et al.: Stochastic adversarial video prediction. arXiv preprint arXiv:1804.01523 (2018)

  20. Lerer, A., Gross, S., Fergus, R.: Learning physical intuition of block towers by example. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning, vol. 48, pp. 430–438 (2016)

    Google Scholar 

  21. Li, W., Leonardis, A., Fritz, M.: Visual stability prediction and its application to manipulation. In: AAAI (2017)

    Google Scholar 

  22. Luc, P., Neverova, N., Couprie, C., Verbeek, J., LeCun, Y.: Predicting deeper into the future of semantic segmentation. In: ICCV (2017)

    Google Scholar 

  23. Misra, I., Zitnick, C.L., Hebert, M.: Shuffle and learn: unsupervised learning using temporal order verification. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 527–544. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_32

    Chapter  Google Scholar 

  24. Monszpart, A., Thuerey, N., Mitra, N.: SMASH: physics-guided reconstruction of collisions from videos. ACM Trans. Graph. (TOG) 35(6), 1–14 (2016)

    Article  Google Scholar 

  25. Mottaghi, R., et al.: Newtonian scene understanding: unfolding the dynamics of objects in static images. In: IEEE CVPR (2016)

    Google Scholar 

  26. Mrowca, D., et al.: Flexible Neural Representation for Physics Prediction. arXiv e-prints (2018)

    Google Scholar 

  27. Novotny, D., et al.: Self-supervised learning of geometrically stable features through probabilistic introspection (2018)

    Google Scholar 

  28. Oh, J., et al.: Action-conditional video prediction using deep networks in atari games. In: Advances in Neural Information Processing Systems, pp. 2863–2871 (2015)

    Google Scholar 

  29. Ondruska, P., Posner, I.: Deep tracking: seeing beyond seeing using recurrent neural networks. In: Proceedings of AAAI (2016)

    Google Scholar 

  30. Riochet, R., et al.: IntPhys: A Framework and Benchmark for Visual Intuitive Physics Reasoning. arXiv e-prints (2018)

    Google Scholar 

  31. Sanborn, A.N., Mansinghka, V.K., Griffiths, T.L.: Reconciling intuitive physics and newtonian mechanics for colliding objects. Psychol. Rev. 120(2), 411 (2013)

    Article  Google Scholar 

  32. Sanchez-Gonzalez, A., et al.: Graph networks as learnable physics engines for inference and control (2018)

    Google Scholar 

  33. Stewart, R., Ermon, S.: Label-free supervision of neural networks with physics and domain knowledge. In: AAAI, pp. 2576–2582 (2017)

    Google Scholar 

  34. Thewlis, J., Bilen, H., Vedaldi, A.: Unsupervised learning of object frames by dense equivariant image labelling. In: Advances in Neural Information Processing Systems (NIPS), pp. 844–855 (2017)

    Google Scholar 

  35. Tompson, J., et al.: Accelerating Eulerian Fluid Simulation With Convolutional Networks. arXiv e-print arXiv:1607.03597 (2016)

  36. Watters, N., et al.: Visual interaction networks: learning a physics simulator from video. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30, pp. 4542–4550. Curran Associates, Inc. (2017)

    Google Scholar 

  37. Wu, J., et al.: Galileo: perceiving physical object properties by integrating a physics engine with deep learning. In: Proceedings of NIPS, pp. 127–135 (2015)

    Google Scholar 

  38. Wu, J., et al.: Physics 101: learning physical object properties from unlabeled videos. In: Proceedings of BMVC (2016)

    Google Scholar 

  39. Wu, J., et al.: Learning to see physics via visual de-animation. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems (NIPS) 30, pp. 153–164. Curran Associates, Inc. (2017)

    Google Scholar 

  40. Wu, J., et al.: Learning to see physics via visual de-animation. In: Proceedings of NIPS (2017)

    Google Scholar 

Download references

Acknowledgements

The authors would like to gratefully acknowledge the support of ERC 677195-IDIU and ERC SmartGeometry StG-2013-335373 grants.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sebastien Ehrhardt .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ehrhardt, S., Monszpart, A., Mitra, N., Vedaldi, A. (2019). Unsupervised Intuitive Physics from Visual Observations. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11363. Springer, Cham. https://doi.org/10.1007/978-3-030-20893-6_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-20893-6_44

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-20892-9

  • Online ISBN: 978-3-030-20893-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics