Abstract
Many art forms present visual content as a single image captured from a particular viewpoint. How to select a meaningful representative moment from an action performance is difficult, even for an experienced artist. Often, a well-picked image can tell a story properly. This is important for a range of narrative scenarios, such as journalists reporting breaking news, scholars presenting their research, or artists crafting artworks. We address the underlying structures and mechanisms of a pictorial narrative with a new concept, called the action snapshot, which automates the process of generating a meaningful snapshot (a single still image) from an input of scene sequences. The input of dynamic scenes could include several interactive characters who are fully animated. We propose a novel method based on information theory to quantitatively evaluate the information contained in a pose. Taking the selected top postures as input, a convolutional neural network is constructed and trained with the method of deep reinforcement learning to select a single viewpoint, which maximally conveys the information of the sequence. User studies are conducted to experimentally compare the computer-selected poses and viewpoints with those selected by human participants. The results show that the proposed method can assist the selection of the most informative snapshot effectively from animation-intensive scenarios.
Similar content being viewed by others
References
Akgun, B., Cakmak, M., Jiang, K., Thomaz, A.L.: Keyframe-based learning from demonstration. Int. J. Soc. Robot. 4(4), 343–355 (2012)
Assa, J., Caspi, Y., Cohen-Or, D.: Action synopsis: pose selection and illustration. ACM Trans. Graph. (TOG) 24(3), 667–676 (2005)
Assa, J., Cohen-Or, D., Yeh, I.C., Lee, T.Y., et al.: Motion overview of human actions. In: ACM Transactions on Graphics (TOG), vol. 27, p. 115. ACM (2008)
Assa, J., Wolf, L., Cohen-Or, D.: The virtual director: a correlation-based online viewing of human motion. In: Computer Graphics Forum, vol. 29, pp. 595–604. Wiley Online Library (2010)
Caspi, Y., Axelrod, A., Matsushita, Y., Gamliel, A.: Dynamic stills and clip trailers. Vis Comput 22(9), 642–652 (2006)
Coleman, P., Bibliowicz, J., Singh, K., Gleicher, M.: Staggered poses: a character motion representation for detail-preserving editing of pose and coordinated timing. In: Proceedings of the 2008 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pp. 137–146. Eurographics Association (2008)
Correa, C.D., Ma, K.L.: Dynamic video narratives. In: ACM Transactions on Graphics (TOG), 29, 88. ACM (2010)
Grodzevich, O., Romanko, O.: Normalization and other topics in multi-objective optimization. In: Proceedings of the Fields MITACS Industrial Problems Workshop (2006)
Halit, C., Capin, T.: Multiscale motion saliency for keyframe extraction from motion capture sequences. Comput. Animat. Virtual Worlds 22(1), 3–14 (2011)
Huang, K.S., Chang, C.F., Hsu, Y.Y., Yang, S.N.: Key probe: a technique for animation keyframe extraction. Vis. Comput. 21(8–10), 532–541 (2005)
Jin, C., Fevens, T., Mudur, S.: Optimized keyframe extraction for 3d character animations. Comput. Animat. Virtual Worlds 23(6), 559–568 (2012)
Kwon, J.Y., Lee, I.K.: Determination of camera parameters for character motions using motion area. Vis. Comput. 24(7–9), 475–483 (2008)
Lee, H.J., Shin, H.J., Choi, J.J.: Single image summarization of 3d animation using depth images. Comput. Animat. Virtual Worlds 23(3–4), 417–424 (2012)
Lessing, G.: Laocoon, or the limits of painting and poetry. Cosmop. Art J. 3(2), 56–58 (1976)
Lino, C., Christie, M.: Efficient composition for virtual camera control. In: Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pp. 65–70. Eurographics Association (2012)
Lino, C., Christie, M.: Intuitive and efficient camera control with the toric space. ACM Trans. Graph. (TOG) 34(4), 82 (2015)
Liu, Xm, Hao, Am, Zhao, D.: Optimization-based key frame extraction for motion capture animation. Vis. Comput. 29(1), 85–95 (2013)
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Rav-Acha, A., Pritch, Y., Peleg, S.: Making a long video short: dynamic video synopsis. In: Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, vol. 1, pp. 435–441. IEEE (2006)
Rudoy, D., Zelnik-Manor, L.: Viewpoint selection for human actions. Int. J. Comput. Vis. 97(3), 243–254 (2012)
Secord, A., Lu, J., Finkelstein, A., Singh, M., Nealen, A.: Perceptual models of viewpoint preference. ACM Trans. Graph. (TOG) 30(5), 109 (2011)
Speidel, K.: Can a single still picture tell a story? Definitions of narrative and the alleged problem of time with single still pictures. Diegesis (2013)
Turkay, C., Koc, E., Balcisoy, S.: An information theoretic approach to camera control for crowded scenes. Vis. Comput. 25(5), 451–459 (2009)
Vázquez, P.P., Feixas, M., Sbert, M., Heidrich, W.: Viewpoint selection using viewpoint entropy. VMV 1, 273–280 (2001)
Wang, M., Guo, S., Liao, M., He, D., Chang, J., Zhang, J., Zhang, Z.: Pose selection for animated scenes and a case study of bas-relief generation. In: Computer Graphics International Conference, p. 31 (2017)
Wang, M., Guo, S., Zhang, H., He, D., Chang, J., Zhang, J.J.: Saliency-based relief generation. IETE Tech. Rev. 30(6), 454–480 (2013)
Wang, W., Gao, T.: Constructing canonical regions for fast and effective view selection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4114–4122 (2016)
Werner, W.: Narrative and narrativity: a narratological reconceptualization and its applicability to the visual arts. Word Image 19(3), 180–197 (2003)
Williams, R.: The Animator’s Survival Kit: A Manual of Methods, Principles and Formulas for Classical, Computer, Games, Stop Motion and Internet Animators. Macmillan, NY (2009)
Xia, G., Sun, H., Niu, X., Zhang, G., Feng, L.: Keyframe extraction for human motion capture data based on joint kernel sparse representation. IEEE Trans. Ind. Electron. 64(2), 1589–1599 (2017)
Zhang, Y.W., Zhou, Y.Q., Li, X.L., Liu, H., Zhang, L.L.: Bas-relief generation and shape editing through gradient-based mesh deformation. IEEE Trans. Vis. Comput. Graph. 21(3), 328–338 (2015)
Acknowledgements
We are grateful to the reviewers and editors for their valuable comments and constructive suggestions. This work is supported by National Natural Science Foundation (61402374, 61661146002, 61702433), China, Postdoctoral Science Foundation (2014M562457, 2016M600506) and the Fundamental Research Funds for the Central Universities (QN2012033). We also thank the researchers who maintains the CMU motion capture database and the Stanford 3D Scanning Repository.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, M., Guo, S., Liao, M. et al. Action snapshot with single pose and viewpoint. Vis Comput 35, 507–520 (2019). https://doi.org/10.1007/s00371-018-1479-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-018-1479-9