Action snapshot with single pose and viewpoint

Wang, Meili; Guo, Shihui; Liao, Minghong; He, Dongjian; Chang, Jian; Zhang, Jianjun

doi:10.1007/s00371-018-1479-9

Action snapshot with single pose and viewpoint

Original Article
Published: 16 February 2018

Volume 35, pages 507–520, (2019)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Meili Wang^1,2,
Shihui Guo ORCID: orcid.org/0000-0002-1473-297X³,
Minghong Liao³,
Dongjian He^1,2,
Jian Chang⁴ &
…
Jianjun Zhang⁴

456 Accesses
3 Citations
Explore all metrics

Abstract

Many art forms present visual content as a single image captured from a particular viewpoint. How to select a meaningful representative moment from an action performance is difficult, even for an experienced artist. Often, a well-picked image can tell a story properly. This is important for a range of narrative scenarios, such as journalists reporting breaking news, scholars presenting their research, or artists crafting artworks. We address the underlying structures and mechanisms of a pictorial narrative with a new concept, called the action snapshot, which automates the process of generating a meaningful snapshot (a single still image) from an input of scene sequences. The input of dynamic scenes could include several interactive characters who are fully animated. We propose a novel method based on information theory to quantitatively evaluate the information contained in a pose. Taking the selected top postures as input, a convolutional neural network is constructed and trained with the method of deep reinforcement learning to select a single viewpoint, which maximally conveys the information of the sequence. User studies are conducted to experimentally compare the computer-selected poses and viewpoints with those selected by human participants. The results show that the proposed method can assist the selection of the most informative snapshot effectively from animation-intensive scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An interpretable composite CNN and GRU for fine-grained martial arts motion modeling using big data analytics and machine learning

Article 05 January 2024

Gang Chen

Pictonaut: movie cartoonization using 3D human pose estimation and GANs

Article Open access 18 February 2023

Ruben Tous

vi-MoCoGAN: A Variant of MoCoGAN for Video Generation of Human Hand Gestures Under Different Viewpoints

Notes

References

Akgun, B., Cakmak, M., Jiang, K., Thomaz, A.L.: Keyframe-based learning from demonstration. Int. J. Soc. Robot. 4(4), 343–355 (2012)
Article Google Scholar
Assa, J., Caspi, Y., Cohen-Or, D.: Action synopsis: pose selection and illustration. ACM Trans. Graph. (TOG) 24(3), 667–676 (2005)
Article Google Scholar
Assa, J., Cohen-Or, D., Yeh, I.C., Lee, T.Y., et al.: Motion overview of human actions. In: ACM Transactions on Graphics (TOG), vol. 27, p. 115. ACM (2008)
Assa, J., Wolf, L., Cohen-Or, D.: The virtual director: a correlation-based online viewing of human motion. In: Computer Graphics Forum, vol. 29, pp. 595–604. Wiley Online Library (2010)
Caspi, Y., Axelrod, A., Matsushita, Y., Gamliel, A.: Dynamic stills and clip trailers. Vis Comput 22(9), 642–652 (2006)
Article Google Scholar
Coleman, P., Bibliowicz, J., Singh, K., Gleicher, M.: Staggered poses: a character motion representation for detail-preserving editing of pose and coordinated timing. In: Proceedings of the 2008 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pp. 137–146. Eurographics Association (2008)
Correa, C.D., Ma, K.L.: Dynamic video narratives. In: ACM Transactions on Graphics (TOG), 29, 88. ACM (2010)
Grodzevich, O., Romanko, O.: Normalization and other topics in multi-objective optimization. In: Proceedings of the Fields MITACS Industrial Problems Workshop (2006)
Halit, C., Capin, T.: Multiscale motion saliency for keyframe extraction from motion capture sequences. Comput. Animat. Virtual Worlds 22(1), 3–14 (2011)
Article Google Scholar
Huang, K.S., Chang, C.F., Hsu, Y.Y., Yang, S.N.: Key probe: a technique for animation keyframe extraction. Vis. Comput. 21(8–10), 532–541 (2005)
Article Google Scholar
Jin, C., Fevens, T., Mudur, S.: Optimized keyframe extraction for 3d character animations. Comput. Animat. Virtual Worlds 23(6), 559–568 (2012)
Article Google Scholar
Kwon, J.Y., Lee, I.K.: Determination of camera parameters for character motions using motion area. Vis. Comput. 24(7–9), 475–483 (2008)
Article Google Scholar
Lee, H.J., Shin, H.J., Choi, J.J.: Single image summarization of 3d animation using depth images. Comput. Animat. Virtual Worlds 23(3–4), 417–424 (2012)
Article Google Scholar
Lessing, G.: Laocoon, or the limits of painting and poetry. Cosmop. Art J. 3(2), 56–58 (1976)
Google Scholar
Lino, C., Christie, M.: Efficient composition for virtual camera control. In: Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pp. 65–70. Eurographics Association (2012)
Lino, C., Christie, M.: Intuitive and efficient camera control with the toric space. ACM Trans. Graph. (TOG) 34(4), 82 (2015)
Article Google Scholar
Liu, Xm, Hao, Am, Zhao, D.: Optimization-based key frame extraction for motion capture animation. Vis. Comput. 29(1), 85–95 (2013)
Article Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Rav-Acha, A., Pritch, Y., Peleg, S.: Making a long video short: dynamic video synopsis. In: Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, vol. 1, pp. 435–441. IEEE (2006)
Rudoy, D., Zelnik-Manor, L.: Viewpoint selection for human actions. Int. J. Comput. Vis. 97(3), 243–254 (2012)
Article Google Scholar
Secord, A., Lu, J., Finkelstein, A., Singh, M., Nealen, A.: Perceptual models of viewpoint preference. ACM Trans. Graph. (TOG) 30(5), 109 (2011)
Article Google Scholar
Speidel, K.: Can a single still picture tell a story? Definitions of narrative and the alleged problem of time with single still pictures. Diegesis (2013)
Turkay, C., Koc, E., Balcisoy, S.: An information theoretic approach to camera control for crowded scenes. Vis. Comput. 25(5), 451–459 (2009)
Article Google Scholar
Vázquez, P.P., Feixas, M., Sbert, M., Heidrich, W.: Viewpoint selection using viewpoint entropy. VMV 1, 273–280 (2001)
Google Scholar
Wang, M., Guo, S., Liao, M., He, D., Chang, J., Zhang, J., Zhang, Z.: Pose selection for animated scenes and a case study of bas-relief generation. In: Computer Graphics International Conference, p. 31 (2017)
Wang, M., Guo, S., Zhang, H., He, D., Chang, J., Zhang, J.J.: Saliency-based relief generation. IETE Tech. Rev. 30(6), 454–480 (2013)
Article Google Scholar
Wang, W., Gao, T.: Constructing canonical regions for fast and effective view selection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4114–4122 (2016)
Werner, W.: Narrative and narrativity: a narratological reconceptualization and its applicability to the visual arts. Word Image 19(3), 180–197 (2003)
Article Google Scholar
Williams, R.: The Animator’s Survival Kit: A Manual of Methods, Principles and Formulas for Classical, Computer, Games, Stop Motion and Internet Animators. Macmillan, NY (2009)
Google Scholar
Xia, G., Sun, H., Niu, X., Zhang, G., Feng, L.: Keyframe extraction for human motion capture data based on joint kernel sparse representation. IEEE Trans. Ind. Electron. 64(2), 1589–1599 (2017)
Article Google Scholar
Zhang, Y.W., Zhou, Y.Q., Li, X.L., Liu, H., Zhang, L.L.: Bas-relief generation and shape editing through gradient-based mesh deformation. IEEE Trans. Vis. Comput. Graph. 21(3), 328–338 (2015)
Article Google Scholar

Download references

Acknowledgements

We are grateful to the reviewers and editors for their valuable comments and constructive suggestions. This work is supported by National Natural Science Foundation (61402374, 61661146002, 61702433), China, Postdoctoral Science Foundation (2014M562457, 2016M600506) and the Fundamental Research Funds for the Central Universities (QN2012033). We also thank the researchers who maintains the CMU motion capture database and the Stanford 3D Scanning Repository.

Author information

Authors and Affiliations

College of Information Engineering, Northwest A&F University, Xianyang, China
Meili Wang & Dongjian He
Key Laboratory of Agricultural Internet of Things, Ministry of Agriculture, Xianyang, China
Meili Wang & Dongjian He
School of Software, Xiamen University, Xiamen, China
Shihui Guo & Minghong Liao
National Centre for Computer Animation, Bournemouth University, Poole, UK
Jian Chang & Jianjun Zhang

Authors

Meili Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shihui Guo
View author publications
You can also search for this author in PubMed Google Scholar
Minghong Liao
View author publications
You can also search for this author in PubMed Google Scholar
Dongjian He
View author publications
You can also search for this author in PubMed Google Scholar
Jian Chang
View author publications
You can also search for this author in PubMed Google Scholar
Jianjun Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shihui Guo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, M., Guo, S., Liao, M. et al. Action snapshot with single pose and viewpoint. Vis Comput 35, 507–520 (2019). https://doi.org/10.1007/s00371-018-1479-9

Download citation

Published: 16 February 2018
Issue Date: 01 April 2019
DOI: https://doi.org/10.1007/s00371-018-1479-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Action snapshot with single pose and viewpoint

Abstract

Access this article

Similar content being viewed by others

An interpretable composite CNN and GRU for fine-grained martial arts motion modeling using big data analytics and machine learning

Pictonaut: movie cartoonization using 3D human pose estimation and GANs

vi-MoCoGAN: A Variant of MoCoGAN for Video Generation of Human Hand Gestures Under Different Viewpoints

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Action snapshot with single pose and viewpoint

Abstract

Access this article

Similar content being viewed by others

An interpretable composite CNN and GRU for fine-grained martial arts motion modeling using big data analytics and machine learning

Pictonaut: movie cartoonization using 3D human pose estimation and GANs

vi-MoCoGAN: A Variant of MoCoGAN for Video Generation of Human Hand Gestures Under Different Viewpoints

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation