skip to main content
research-article

AgileGAN: stylizing portraits by inversion-consistent transfer learning

Published:19 July 2021Publication History
Skip Abstract Section

Abstract

Portraiture as an art form has evolved from realistic depiction into a plethora of creative styles. While substantial progress has been made in automated stylization, generating high quality stylistic portraits is still a challenge, and even the recent popular Toonify suffers from several artifacts when used on real input images. Such StyleGAN-based methods have focused on finding the best latent inversion mapping for reconstructing input images; however, our key insight is that this does not lead to good generalization to different portrait styles. Hence we propose AgileGAN, a framework that can generate high quality stylistic portraits via inversion-consistent transfer learning. We introduce a novel hierarchical variational autoencoder to ensure the inverse mapped distribution conforms to the original latent Gaussian distribution, while augmenting the original space to a multi-resolution latent space so as to better encode different levels of detail. To better capture attribute-dependent stylization of facial features, we also present an attribute-aware generator and adopt an early stopping strategy to avoid overfitting small training datasets. Our approach provides greater agility in creating high quality and high resolution (1024×1024) portrait stylization models, requiring only a limited number of style exemplars (~100) and short training time (~1 hour). We collected several style datasets for evaluation including 3D cartoons, comics, oil paintings and celebrities. We show that we can achieve superior portrait stylization quality to previous state-of-the-art methods, with comparisons done qualitatively, quantitatively and through a perceptual user study. We also demonstrate two applications of our method, image editing and motion retargeting.

Skip Supplemental Material Section

Supplemental Material

a117-song.mp4

mp4

89.6 MB

3450626.3459771.mp4

Presentation.

mp4

479.4 MB

References

  1. Rameen Abdal, Yipeng Qin, and Peter Wonka. 2019a. Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space?. In ICCV.Google ScholarGoogle Scholar
  2. Rameen Abdal, Yipeng Qin, and Peter Wonka. 2019b. Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space?. In ICCV.Google ScholarGoogle Scholar
  3. David Bau, Hendrik Strobelt, William Peebles, Jonas Wulff, Bolei Zhou, JunYan Zhu, and Antonio Torralba. 2019a. Semantic Photo Manipulation with a Generative Image Prior. In ACM Transactions on Graphics.Google ScholarGoogle Scholar
  4. David Bau, Jun-Yan Zhu, Jonas Wulff, William Peebles, Hendrik Strobelt, Bolei Zhou, and Antonio Torralba. 2019b. Seeing What a GAN Cannot Generate. In ICCV.Google ScholarGoogle Scholar
  5. Jiankang Deng, Jia Guo, Xue Niannan, and Stefanos Zafeiriou. 2019. ArcFace: Additive Angular Margin Loss for Deep Face Recognition. In CVPR.Google ScholarGoogle Scholar
  6. E. Eidinger, R. Enbar, and T. Hassner. 2014. Age and Gender Estimation of Unfiltered Faces. IEEE Transactions on Information Forensics and Security.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. L. A. Gatys, A. S. Ecker, and M. Bethge. 2016. Image Style Transfer Using Convolutional Neural Networks. In CVPR.Google ScholarGoogle Scholar
  8. Baris Gecer, Alexander Lattas, Stylianos Ploumpis, Jiankang Deng, Athanasios Papaioannou, Stylianos Moschoglou, and Stefanos Zafeiriou. 2020. Synthesizing Coupled 3D Face Modalities by Trunk-Branch Generative Adversarial Networks. In ECCV.Google ScholarGoogle Scholar
  9. Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Proc. NeurIPS.Google ScholarGoogle Scholar
  10. Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron Courville. 2017. Improved Training of Wasserstein GANs. In NeurIPS.Google ScholarGoogle Scholar
  11. David J. Heeger and James R. Bergen. 1995. Pyramid-Based Texture Analysis/Synthesis. In ACM Trans. Graph.Google ScholarGoogle Scholar
  12. Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.).Google ScholarGoogle Scholar
  13. Xun Huang and Serge Belongie. 2017. Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization. In ICCV.Google ScholarGoogle Scholar
  14. Xun Huang, Ming-Yu Liu, Serge Belongie, and Jan Kautz. 2018. Multimodal Unsupervised Image-to-image Translation. In ECCV.Google ScholarGoogle Scholar
  15. Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-Image Translation with Conditional Adversarial Networks. In CVPR.Google ScholarGoogle Scholar
  16. Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In ECCV.Google ScholarGoogle Scholar
  17. Levent Karacan, Zeynep Akata, Aykut Erdem, and Erkut Erdem. 2016. Learning to Generate Images of Outdoor Scenes from Attributes and Semantic Layouts. In Proc. NeurIPS.Google ScholarGoogle Scholar
  18. Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila. 2020a. Training Generative Adversarial Networks with Limited Data. In Proc. NeurIPS.Google ScholarGoogle Scholar
  19. Tero Karras, Samuli Laine, and Timo Aila. 2019. A Style-Based Generator Architecture for Generative Adversarial Networks. In CVPR.Google ScholarGoogle Scholar
  20. Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020b. Analyzing and Improving the Image Quality of StyleGAN. In CVPR.Google ScholarGoogle Scholar
  21. Junho Kim, Minjae Kim, Hyeonwoo Kang, and Kwang Hee Lee. 2020. U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  22. Diederik P. Kingma and M. Welling. 2014. Auto-Encoding Variational Bayes. (2014).Google ScholarGoogle Scholar
  23. Cheng-Han Lee, Ziwei Liu, Lingyun Wu, and Ping Luo. 2020. MaskGAN: Towards Diverse and Interactive Facial Image Manipulation. In CVPR.Google ScholarGoogle Scholar
  24. Chuan Li and Michael Wand. 2016. Combining Markov Random Fields and Convolutional Neural Networks for Image Synthesis. In CVPR.Google ScholarGoogle Scholar
  25. Jerry Li. 2018. Twin-GAN - Unpaired Cross-Domain Image Translation with Weight-Sharing GANs.Google ScholarGoogle Scholar
  26. T. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie. 2017. Feature Pyramid Networks for Object Detection. In CVPR.Google ScholarGoogle Scholar
  27. Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, and Jiawei Han. 2020. On the Variance of the Adaptive Learning Rate and Beyond. In Proceedings of the Eighth International Conference on Learning Representations (ICLR 2020).Google ScholarGoogle Scholar
  28. Ming-Yu Liu, Thomas Breuel, and Jan Kautz. 2017. Unsupervised Image-to-Image Translation Networks. In Proceedings of the 31st International Conference on Neural Information Processing Systems.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Ming-Yu Liu, Xun Huang, Arun Mallya, Tero Karras, Timo Aila, Jaakko Lehtinen, and Jan Kautz. 2019. Few-shot Unsueprvised Image-to-Image Translation. In CVPR.Google ScholarGoogle Scholar
  30. Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep Learning Face Attributes in the Wild. In ICCV.Google ScholarGoogle Scholar
  31. X. Mao, Q. Li, H. Xie, R. Y. K. Lau, Z. Wang, and S. P. Smolley. 2017. Least Squares Generative Adversarial Networks. In ICCV.Google ScholarGoogle Scholar
  32. Sachit Menon, Alexandru Damian, Shijia Hu, Nikhil Ravi, and Cynthia Rudin. 2020. PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models. In CVPR.Google ScholarGoogle Scholar
  33. Lars Mescheder, Sebastian Nowozin, and Andreas Geiger. 2018. Which Training Methods for GANs do actually Converge?. In International Conference on Machine Learning (ICML).Google ScholarGoogle Scholar
  34. Justin N. M. Pinkney and Doron Adler. 2020. Resolution Dependent GAN Interpolation for Controllable Image Synthesis Between Domains. In NeurIPS Workshop.Google ScholarGoogle Scholar
  35. pinterest 2021. pinterest. https://www.pinterest.com/.Google ScholarGoogle Scholar
  36. Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. 2014. Stochastic Back-propagation and Approximate Inference in Deep Generative Models. In International Conference on International Conference on Machine Learning.Google ScholarGoogle Scholar
  37. Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, and Daniel Cohen-Or. 2020. Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation. arXiv preprint arXiv:2008.00951 (2020).Google ScholarGoogle Scholar
  38. Manuel Ruder, Alexey Dosovitskiy, and Thomas Brox. 2016. Artistic Style Transfer for Videos. In German Conference on Pattern Recognition.Google ScholarGoogle Scholar
  39. P. Sangkloy, J. Lu, C. Fang, F. Yu, and J. Hays. 2017. Scribbler: Controlling Deep Image Synthesis with Sketch and Color. In CVPR.Google ScholarGoogle Scholar
  40. T. R. Shaham, T. Dekel, and T. Michaeli. 2019. SinGAN: Learning a Generative Model From a Single Natural Image. In ICCV.Google ScholarGoogle Scholar
  41. Yujun Shen and Bolei Zhou. 2020. Closed-Form Factorization of Latent Semantics in GANs. In ECCV.Google ScholarGoogle Scholar
  42. A. Shocher, N. Cohen, and M. Irani. 2018. Zero-Shot Super-Resolution Using Deep Internal Learning. In CVPR.Google ScholarGoogle Scholar
  43. Aliaksandr Siarohin, Stéphane Lathuilière, Sergey Tulyakov, Elisa Ricci, and Nicu Sebe. 2019. First Order Motion Model for Image Animation. In NeurIPS.Google ScholarGoogle Scholar
  44. Guoxian Song, Jianmin Zheng, Jianfei Cai, and Tat-Jen Cham. 2020. Recovering facial reflectance and geometry from multi-view images. In Image and Vision Computing.Google ScholarGoogle Scholar
  45. Ayush Tewari, Mohamed Elgharib, Mallikarjun B R, Florian Bernard, Hans-Peter Seidel, Patrick Pérez, Michael Zollhöfer, and Christian Theobalt. 2020. PIE: Portrait Image Embedding for Semantic Control. In ACM Trans. Graph.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. turbosquid 2021. turbosquid. https://www.turbosquid.com/Search/3D-Models/.Google ScholarGoogle Scholar
  47. Ting-Chun Wang, Ming-Yu Liu, Andrew Tao, Guilin Liu, Jan Kautz, and Bryan Catanzaro. 2019. Few-shot Video-to-Video Synthesis. In NeurIPS.Google ScholarGoogle Scholar
  48. Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs. In CVPR.Google ScholarGoogle Scholar
  49. Jonas Wulff and Antonio Torralba. 2020. Improving Inversion and Generation Diversity in StyleGAN using a Gaussianized Latent Space. In Conference on Neural Information Processing Systems.Google ScholarGoogle Scholar
  50. L. Yuan, C. Ruan, H. Hu, and D. Chen. 2019. Image Inpainting Based on Patch-GANs. In IEEE Access.Google ScholarGoogle Scholar
  51. Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In CVPR.Google ScholarGoogle Scholar
  52. Jiapeng Zhu, Yujun Shen, Deli Zhao, and Bolei Zhou. 2020. In-domain GAN Inversion for Real Image Editing. In ECCV.Google ScholarGoogle Scholar
  53. Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, and Alexei A. Efros. 2016. Generative Visual Manipulation on the Natural Image Manifold. In ECCV.Google ScholarGoogle Scholar
  54. Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networkss. In ICCV.Google ScholarGoogle Scholar

Index Terms

  1. AgileGAN: stylizing portraits by inversion-consistent transfer learning

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Graphics
      ACM Transactions on Graphics  Volume 40, Issue 4
      August 2021
      2170 pages
      ISSN:0730-0301
      EISSN:1557-7368
      DOI:10.1145/3450626
      Issue’s Table of Contents

      Copyright © 2021 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 19 July 2021
      Published in tog Volume 40, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader