Skip to main content
Log in

Action snapshot with single pose and viewpoint

  • Original Article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Many art forms present visual content as a single image captured from a particular viewpoint. How to select a meaningful representative moment from an action performance is difficult, even for an experienced artist. Often, a well-picked image can tell a story properly. This is important for a range of narrative scenarios, such as journalists reporting breaking news, scholars presenting their research, or artists crafting artworks. We address the underlying structures and mechanisms of a pictorial narrative with a new concept, called the action snapshot, which automates the process of generating a meaningful snapshot (a single still image) from an input of scene sequences. The input of dynamic scenes could include several interactive characters who are fully animated. We propose a novel method based on information theory to quantitatively evaluate the information contained in a pose. Taking the selected top postures as input, a convolutional neural network is constructed and trained with the method of deep reinforcement learning to select a single viewpoint, which maximally conveys the information of the sequence. User studies are conducted to experimentally compare the computer-selected poses and viewpoints with those selected by human participants. The results show that the proposed method can assist the selection of the most informative snapshot effectively from animation-intensive scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Similar content being viewed by others

Notes

  1. http://mocap.cs.cmu.edu/.

  2. https://accad.osu.edu/research/mocap/mocap_data.htm.

  3. http://mocapdata.com/.

References

  1. Akgun, B., Cakmak, M., Jiang, K., Thomaz, A.L.: Keyframe-based learning from demonstration. Int. J. Soc. Robot. 4(4), 343–355 (2012)

    Article  Google Scholar 

  2. Assa, J., Caspi, Y., Cohen-Or, D.: Action synopsis: pose selection and illustration. ACM Trans. Graph. (TOG) 24(3), 667–676 (2005)

    Article  Google Scholar 

  3. Assa, J., Cohen-Or, D., Yeh, I.C., Lee, T.Y., et al.: Motion overview of human actions. In: ACM Transactions on Graphics (TOG), vol. 27, p. 115. ACM (2008)

  4. Assa, J., Wolf, L., Cohen-Or, D.: The virtual director: a correlation-based online viewing of human motion. In: Computer Graphics Forum, vol. 29, pp. 595–604. Wiley Online Library (2010)

  5. Caspi, Y., Axelrod, A., Matsushita, Y., Gamliel, A.: Dynamic stills and clip trailers. Vis Comput 22(9), 642–652 (2006)

    Article  Google Scholar 

  6. Coleman, P., Bibliowicz, J., Singh, K., Gleicher, M.: Staggered poses: a character motion representation for detail-preserving editing of pose and coordinated timing. In: Proceedings of the 2008 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pp. 137–146. Eurographics Association (2008)

  7. Correa, C.D., Ma, K.L.: Dynamic video narratives. In: ACM Transactions on Graphics (TOG), 29, 88. ACM (2010)

  8. Grodzevich, O., Romanko, O.: Normalization and other topics in multi-objective optimization. In: Proceedings of the Fields MITACS Industrial Problems Workshop (2006)

  9. Halit, C., Capin, T.: Multiscale motion saliency for keyframe extraction from motion capture sequences. Comput. Animat. Virtual Worlds 22(1), 3–14 (2011)

    Article  Google Scholar 

  10. Huang, K.S., Chang, C.F., Hsu, Y.Y., Yang, S.N.: Key probe: a technique for animation keyframe extraction. Vis. Comput. 21(8–10), 532–541 (2005)

    Article  Google Scholar 

  11. Jin, C., Fevens, T., Mudur, S.: Optimized keyframe extraction for 3d character animations. Comput. Animat. Virtual Worlds 23(6), 559–568 (2012)

    Article  Google Scholar 

  12. Kwon, J.Y., Lee, I.K.: Determination of camera parameters for character motions using motion area. Vis. Comput. 24(7–9), 475–483 (2008)

    Article  Google Scholar 

  13. Lee, H.J., Shin, H.J., Choi, J.J.: Single image summarization of 3d animation using depth images. Comput. Animat. Virtual Worlds 23(3–4), 417–424 (2012)

    Article  Google Scholar 

  14. Lessing, G.: Laocoon, or the limits of painting and poetry. Cosmop. Art J. 3(2), 56–58 (1976)

    Google Scholar 

  15. Lino, C., Christie, M.: Efficient composition for virtual camera control. In: Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pp. 65–70. Eurographics Association (2012)

  16. Lino, C., Christie, M.: Intuitive and efficient camera control with the toric space. ACM Trans. Graph. (TOG) 34(4), 82 (2015)

    Article  Google Scholar 

  17. Liu, Xm, Hao, Am, Zhao, D.: Optimization-based key frame extraction for motion capture animation. Vis. Comput. 29(1), 85–95 (2013)

    Article  Google Scholar 

  18. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  19. Rav-Acha, A., Pritch, Y., Peleg, S.: Making a long video short: dynamic video synopsis. In: Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, vol. 1, pp. 435–441. IEEE (2006)

  20. Rudoy, D., Zelnik-Manor, L.: Viewpoint selection for human actions. Int. J. Comput. Vis. 97(3), 243–254 (2012)

    Article  Google Scholar 

  21. Secord, A., Lu, J., Finkelstein, A., Singh, M., Nealen, A.: Perceptual models of viewpoint preference. ACM Trans. Graph. (TOG) 30(5), 109 (2011)

    Article  Google Scholar 

  22. Speidel, K.: Can a single still picture tell a story? Definitions of narrative and the alleged problem of time with single still pictures. Diegesis (2013)

  23. Turkay, C., Koc, E., Balcisoy, S.: An information theoretic approach to camera control for crowded scenes. Vis. Comput. 25(5), 451–459 (2009)

    Article  Google Scholar 

  24. Vázquez, P.P., Feixas, M., Sbert, M., Heidrich, W.: Viewpoint selection using viewpoint entropy. VMV 1, 273–280 (2001)

    Google Scholar 

  25. Wang, M., Guo, S., Liao, M., He, D., Chang, J., Zhang, J., Zhang, Z.: Pose selection for animated scenes and a case study of bas-relief generation. In: Computer Graphics International Conference, p. 31 (2017)

  26. Wang, M., Guo, S., Zhang, H., He, D., Chang, J., Zhang, J.J.: Saliency-based relief generation. IETE Tech. Rev. 30(6), 454–480 (2013)

    Article  Google Scholar 

  27. Wang, W., Gao, T.: Constructing canonical regions for fast and effective view selection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4114–4122 (2016)

  28. Werner, W.: Narrative and narrativity: a narratological reconceptualization and its applicability to the visual arts. Word Image 19(3), 180–197 (2003)

    Article  Google Scholar 

  29. Williams, R.: The Animator’s Survival Kit: A Manual of Methods, Principles and Formulas for Classical, Computer, Games, Stop Motion and Internet Animators. Macmillan, NY (2009)

    Google Scholar 

  30. Xia, G., Sun, H., Niu, X., Zhang, G., Feng, L.: Keyframe extraction for human motion capture data based on joint kernel sparse representation. IEEE Trans. Ind. Electron. 64(2), 1589–1599 (2017)

    Article  Google Scholar 

  31. Zhang, Y.W., Zhou, Y.Q., Li, X.L., Liu, H., Zhang, L.L.: Bas-relief generation and shape editing through gradient-based mesh deformation. IEEE Trans. Vis. Comput. Graph. 21(3), 328–338 (2015)

    Article  Google Scholar 

Download references

Acknowledgements

We are grateful to the reviewers and editors for their valuable comments and constructive suggestions. This work is supported by National Natural Science Foundation (61402374, 61661146002, 61702433), China, Postdoctoral Science Foundation (2014M562457, 2016M600506) and the Fundamental Research Funds for the Central Universities (QN2012033). We also thank the researchers who maintains the CMU motion capture database and the Stanford 3D Scanning Repository.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shihui Guo.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, M., Guo, S., Liao, M. et al. Action snapshot with single pose and viewpoint. Vis Comput 35, 507–520 (2019). https://doi.org/10.1007/s00371-018-1479-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-018-1479-9

Keywords

Navigation