skip to main content
research-article

paGAN: real-time avatars using dynamic textures

Published:04 December 2018Publication History
Skip Abstract Section

Abstract

With the rising interest in personalized VR and gaming experiences comes the need to create high quality 3D avatars that are both low-cost and variegated. Due to this, building dynamic avatars from a single unconstrained input image is becoming a popular application. While previous techniques that attempt this require multiple input images or rely on transferring dynamic facial appearance from a source actor, we are able to do so using only one 2D input image without any form of transfer from a source image. We achieve this using a new conditional Generative Adversarial Network design that allows fine-scale manipulation of any facial input image into a new expression while preserving its identity. Our photoreal avatar GAN (paGAN) can also synthesize the unseen mouth interior and control the eye-gaze direction of the output, as well as produce the final image from a novel viewpoint. The method is even capable of generating fully-controllable temporally stable video sequences, despite not using temporal information during training. After training, we can use our network to produce dynamic image-based avatars that are controllable on mobile devices in real time. To do this, we compute a fixed set of output images that correspond to key blendshapes, from which we extract textures in UV space. Using a subject's expression blendshapes at run-time, we can linearly blend these key textures together to achieve the desired appearance. Furthermore, we can use the mouth interior and eye textures produced by our network to synthesize on-the-fly avatar animations for those regions. Our work produces state-of-the-art quality image and video synthesis, and is the first to our knowledge that is able to generate a dynamically textured avatar with a mouth interior, all from a single image.

Skip Supplemental Material Section

Supplemental Material

a258-nagano.mp4

mp4

117.9 MB

References

  1. P. Ekman and W. Friesen. 1978. Facial Action Coding System: A Technique for the Measurement of Facial Movement. Consulting Psychologists Press, Palo Alto.Google ScholarGoogle Scholar
  2. O. Alexander, G. Fyffe, J. Busch, X. Yu, R. Ichikari, A. Jones, P. Debevec, J. Jimenez, E. Danvoye, B. Antionazzi, M. Eheler, Z. Kysela, and J. von der Pahlen. 2013. Digital Ira: Creating a Real-time Photoreal Digital Actor. In ACM SIGGRAPH 2013 Posters (SIGGRAPH '13). ACM, New York, NY, USA, Article 1, 1 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. Amberg, R. Knothe, and T. Vetter. 2008. Expression Invariant 3D Face Recognition with a Morphable Model. In International Conference on Automatic Face Gesture Recognition. 1--6.Google ScholarGoogle Scholar
  4. H. Averbuch-Elor, D. Cohen-Or, J. Kopf, and M. F. Cohen. 2017. Bringing Portraits to Life. ACM Trans. Graph. 36, 4 (2017), to appear. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. V. Blanz and T. Vetter. 1999. A Morphable Model for the Synthesis of 3D Faces. In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '99). 187--194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Booth, A. Roussos, A. Ponniah, D. Dunaway, and S. Zafeiriou. 2017. Large Scale 3D Morphable Models. International Journal of Computer Vision (2017), 1--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Booth, A. Roussos, S. Zafeiriou, A. Ponniahy, and D. Dunaway. 2016. A 3D Morphable Model Learnt from 10,000 Faces. In Conference on Computer Vision and Pattern Recognition. 5543--5552.Google ScholarGoogle Scholar
  8. S. Bouaziz, Y. Wang, and M. Pauly. 2013. Online Modeling for Realtime Facial Animation. ACM Trans. Graph. 32, 4, Article 40 (July 2013), 10 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Cao, D. Bradley, K. Zhou, and T. Beeler. 2015. Real-time high-fidelity facial performance capture. ACM Trans. Graph. 34, 4 (2015), 46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Cao, Y. Weng, S. Zhou, Y. Tong, and K. Zhou. 2014. Facewarehouse: A 3d facial expression database for visual computing. IEEE TVCG 20, 3 (2014), 413--425. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Cao, H. Wu, Y. Weng, T. Shao, and K. Zhou. 2016. Real-time facial animation with image-based dynamic avatars. ACM Trans. Graph. 35, 4 (2016), 126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. Casas, A. Feng, O. Alexander, G. Fyffe, P. Debevec, R. Ichikari, H. Li, K. Olszewski, E. Suma, and A. Shapiro. 2016. Rapid Photorealistic Blendshape Modeling from RGB-D Sensors. In Proceedings of the 29th International Conference on Computer Animation and Social Agents. ACM, 121--129. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Y. Choi, M. Choi, M. Kim, J.-W. Ha, S. Kim, and J. Choo. 2017. StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation. arXiv preprint arXiv:1711.09020 (2017).Google ScholarGoogle Scholar
  14. K. Dale, K. Sunkavalli, M. K. Johnson, D. Vlasic, W. Matusik, and H. Pfister. 2011. Video Face Replacement. ACM Trans. Graph. 30, 6, Article 130 (Dec. 2011), 10 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. H. Ding, K. Sricharan, and R. Chellappa. 2017. Exprgan: Facial expression editing with controllable expression intensity. arXiv preprint arXiv:1709.03842 (2017).Google ScholarGoogle Scholar
  16. S. Du, Y. Tao, and A. M. Martinez. 2014. Compound facial expressions of emotion. Proceedings of the National Academy of Sciences 111, 15 (2014), E1454--E1462.Google ScholarGoogle ScholarCross RefCross Ref
  17. P. Garrido, M. Zollhöfer, D. Casas, L. Valgaerts, K. Varanasi, P. Pérez, and C. Theobalt. 2016. Reconstruction of personalized 3D face rigs from monocular video. ACM Trans. Graph. 35, 3 (2016), 28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. L. A. Gatys, A. S. Ecker, and M. Bethge. 2015. A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576 (2015).Google ScholarGoogle Scholar
  19. P.-L. Hsieh, C. Ma, J. Yu, and H. Li. 2015. Unconstrained realtime facial performance capture. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1675--1683.Google ScholarGoogle Scholar
  20. L. Hu, S. Saito, L. Wei, K. Nagano, J. Seo, J. Fursund, I. Sadeghi, C. Sun, Y.-C. Chen, and H. Li. 2017. Avatar Digitization From a Single Image For Real-Time Rendering. ACM Trans. Graph. 36, 6 (2017). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. L. Huynh, W. Chen, S. Saito, J. Xing, K. Nagano, A. Jones, P. Debevec, and H. Li. 2018. Mesoscopic Facial Geometry Inference Using Deep Neural Networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  22. P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. 2016. Image-to-image translation with conditional adversarial networks. arXiv preprint arXiv:1611.07004 (2016).Google ScholarGoogle Scholar
  23. J. Jimenez, T. Scully, N. Barbosa, C. Donner, X. Alvarez, T. Viera, P. Matts, V. Orvalho, D. Gutierrez, and T. Weyrich. 2010. A Practical Appearance Model for Dynamic Facial Color. 29, 5 (2010), 141:1--141:9.Google ScholarGoogle Scholar
  24. T. Karras, T. Aila, S. Laine, and J. Lehtinen. 2017. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017).Google ScholarGoogle Scholar
  25. H. Kim, P. Carrido, A. Tewari, W. Xu, J. Thies, M. Niessner, P. Pérez, C. Richardt, M. Zollhöfer, and C. Theobalt. 2018. Deep Video Portraits. ACM Trans. Graph. 37, 4, Article 163 (July 2018), 14 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. D. Kingma and J. Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google ScholarGoogle Scholar
  27. O. Langner, R. Dotsch, G. Bijlstra, D. H. Wigboldus, S. T. Hawk, and A. Van Knippenberg. 2010. Presentation and validation of the Radboud Faces Database. Cognition and emotion 24, 8 (2010), 1377--1388.Google ScholarGoogle Scholar
  28. H. Li, B. Adams, L. J. Guibas, and M. Pauly. 2009. Robust Single-View Geometry And Motion Reconstruction. ACM Trans. Graph. 28, 5 (2009). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. H. Li, T. Weise, and M. Pauly. 2010. Example-Based Facial Rigging. ACM Trans. Graph. 29, 3 (July 2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. H. Li, J. Yu, Y. Ye, and C. Bregler. 2013. Realtime Facial Animation with On-the-fly Correctives. ACM Trans. Graph. 32, 4 (July 2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. D. S. Ma, J. Correll, and B. Wittenbrink. 2015. The Chicago face database: A free stimulus set of faces and norming data. Behavior research methods 47, 4 (2015), 1122--1135.Google ScholarGoogle Scholar
  32. K. Olszewski, Z. Li, C. Yang, Y. Zhou, R. Yu, Z. Huang, S. Xiang, S. Saito, P. Kohli, and H. Li. Realistic dynamic facial textures from a single image using gans.Google ScholarGoogle Scholar
  33. S. Saito, T. Li, and H. Li. 2016. Real-Time Facial Segmentation and Performance Capture from RGB Input. In ECCV.Google ScholarGoogle Scholar
  34. S. Saito, L. Wei, L. Hu, K. Nagano, and H. Li. 2017. Photorealistic Facial Texture Inference Using Deep Neural Networks. In Proc. Computer Vision and Pattern Recognition (CVPR), IEEE.Google ScholarGoogle Scholar
  35. Y. Seol, J. Seo, P. H. Kim, J. Lewis, and J. Noh. 2011. Artist friendly facial animation retargeting. In ACM Trans. Graph., Vol. 30. ACM, 162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. L. Song, Z. Lu, R. He, Z. Sun, and T. Tan. 2017. Geometry Guided Adversarial Facial Expression Synthesis. arXiv preprint arXiv:1712.03474 (2017).Google ScholarGoogle Scholar
  37. G. Stratou, A. Ghosh, P. Debevec, and L.-P. Morency. 2011. Effect of illumination on automatic expression recognition: a novel 3D relightable facial database. In Automatic Face & Gesture Recognition and Workshops (FG 2011), 2011 IEEE International Conference on. IEEE, 611--618.Google ScholarGoogle ScholarCross RefCross Ref
  38. S. Suwajanakorn, I. Kemelmacher-Shlizerman, and S. M. Seitz. 2014. Total moving face reconstruction. In ECCV. Springer, 796--812.Google ScholarGoogle Scholar
  39. S. Suwajanakorn, S. M. Seitz, and I. Kemelmacher-Shlizerman. 2017. Synthesizing obama: learning lip sync from audio. ACM Trans. Graph. 36, 4 (2017), 95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. J. Thies, M. Zollhöfer, M. Nießner, L. Valgaerts, M. Stamminger, and C. Theobalt. 2015. Real-time Expression Transfer for Facial Reenactment. ACM Trans. Graph. 34, 6 (2015). Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. J. Thies, M. Zollhöfer, M. Stamminger, C. Theobalt, and M. Nießner. 2016a. Face2Face: Real-time Face Capture and Reenactment of RGB Videos. In IEEE CVPR.Google ScholarGoogle Scholar
  42. J. Thies, M. Zollhofer, M. Stamminger, C. Theobalt, and M. Nießner. 2016b. Face2face: Real-time face capture and reenactment of rgb videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2387--2395.Google ScholarGoogle Scholar
  43. D. Vlasic, M. Brand, H. Pfister, and J. Popović. 2005. Face transfer with multilinear models. ACM Trans. Graph. 24, 3 (2005), 426--433. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. T. Weise, S. Bouaziz, H. Li, and M. Pauly. 2011. Realtime Performance-Based Facial Animation. ACM Trans. Graph. 30, 4 (July 2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. T. Weise, H. Li, L. V. Gool, and M. Pauly. 2009. Face/Off: Live Facial Puppetry. In Proceedings of the 2009 ACM SIGGRAPH/Eurographics Symposium on Computer animation (Proc. SCA'09). Eurographics Association, ETH Zurich. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. X. Wu, R. He, Z. Sun, and T. Tan. 2015. A light CNN for deep face representation with noisy labels. arXiv preprint arXiv:1511.02683 (2015).Google ScholarGoogle Scholar
  47. S. Yamaguchi, S. Saito, K. Nagano, Y. Zhao, W. Chen, K. Olszewski, S. Morishima, and H. Li. 2018. High-fidelity Facial Reflectance and Geometry Inference from an Unconstrained Image. ACM Trans. Graph. 37, 4, Article 162 (July 2018), 14 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. F. Yang, J. Wang, E. Shechtman, L. Bourdev, and D. Metaxas. 2011. Expression flow for 3D-aware face component transfer. ACM Trans. Graph. 30, 4 (2011), 60:1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint arXiv:1703.10593 (2017).Google ScholarGoogle Scholar

Index Terms

  1. paGAN: real-time avatars using dynamic textures

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Graphics
      ACM Transactions on Graphics  Volume 37, Issue 6
      December 2018
      1401 pages
      ISSN:0730-0301
      EISSN:1557-7368
      DOI:10.1145/3272127
      Issue’s Table of Contents

      Copyright © 2018 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 4 December 2018
      Published in tog Volume 37, Issue 6

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader