research-article

AgileGAN: stylizing portraits by inversion-consistent transfer learning

Authors:
Guoxian Song

Nanyang Technological University, Singapore

Nanyang Technological University, Singapore
View Profile

,
Linjie Luo

ByteDance Inc

ByteDance Inc
View Profile

,
Jing Liu

ByteDance Inc

ByteDance Inc
View Profile

,
Wan-Chun Ma

ByteDance Inc

ByteDance Inc
View Profile

,
Chunpong Lai

ByteDance Inc

ByteDance Inc
View Profile

,
Chuanxia Zheng

Nanyang Technological University, Singapore

Nanyang Technological University, Singapore
View Profile

,
Tat-Jen Cham

Nanyang Technological University, Singapore

Nanyang Technological University, Singapore
View Profile

Authors Info & Claims

ACM Transactions on Graphics Volume 40 Issue 4Article No.: 117pp 1–13https://doi.org/10.1145/3450626.3459771

Published:19 July 2021Publication History

ACM Transactions on Graphics

Abstract

Portraiture as an art form has evolved from realistic depiction into a plethora of creative styles. While substantial progress has been made in automated stylization, generating high quality stylistic portraits is still a challenge, and even the recent popular Toonify suffers from several artifacts when used on real input images. Such StyleGAN-based methods have focused on finding the best latent inversion mapping for reconstructing input images; however, our key insight is that this does not lead to good generalization to different portrait styles. Hence we propose AgileGAN, a framework that can generate high quality stylistic portraits via inversion-consistent transfer learning. We introduce a novel hierarchical variational autoencoder to ensure the inverse mapped distribution conforms to the original latent Gaussian distribution, while augmenting the original space to a multi-resolution latent space so as to better encode different levels of detail. To better capture attribute-dependent stylization of facial features, we also present an attribute-aware generator and adopt an early stopping strategy to avoid overfitting small training datasets. Our approach provides greater agility in creating high quality and high resolution (1024×1024) portrait stylization models, requiring only a limited number of style exemplars (~100) and short training time (~1 hour). We collected several style datasets for evaluation including 3D cartoons, comics, oil paintings and celebrities. We show that we can achieve superior portrait stylization quality to previous state-of-the-art methods, with comparisons done qualitatively, quantitatively and through a perceptual user study. We also demonstrate two applications of our method, image editing and motion retargeting.

Supplemental Material

a117-song.mp4

mp4

89.6 MB

Download

3450626.3459771.mp4

Presentation.

mp4

479.4 MB

Download

Available for Download

zip

a117-song.zip (152.3 MB)

a117-song.zip

vtt

3450626.3459771.vtt (16.6 KB)

References

Rameen Abdal, Yipeng Qin, and Peter Wonka. 2019a. Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space?. In ICCV.Google Scholar
Rameen Abdal, Yipeng Qin, and Peter Wonka. 2019b. Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space?. In ICCV.Google Scholar
David Bau, Hendrik Strobelt, William Peebles, Jonas Wulff, Bolei Zhou, JunYan Zhu, and Antonio Torralba. 2019a. Semantic Photo Manipulation with a Generative Image Prior. In ACM Transactions on Graphics.Google Scholar
David Bau, Jun-Yan Zhu, Jonas Wulff, William Peebles, Hendrik Strobelt, Bolei Zhou, and Antonio Torralba. 2019b. Seeing What a GAN Cannot Generate. In ICCV.Google Scholar
Jiankang Deng, Jia Guo, Xue Niannan, and Stefanos Zafeiriou. 2019. ArcFace: Additive Angular Margin Loss for Deep Face Recognition. In CVPR.Google Scholar
E. Eidinger, R. Enbar, and T. Hassner. 2014. Age and Gender Estimation of Unfiltered Faces. IEEE Transactions on Information Forensics and Security.Google ScholarDigital Library
L. A. Gatys, A. S. Ecker, and M. Bethge. 2016. Image Style Transfer Using Convolutional Neural Networks. In CVPR.Google Scholar
Baris Gecer, Alexander Lattas, Stylianos Ploumpis, Jiankang Deng, Athanasios Papaioannou, Stylianos Moschoglou, and Stefanos Zafeiriou. 2020. Synthesizing Coupled 3D Face Modalities by Trunk-Branch Generative Adversarial Networks. In ECCV.Google Scholar
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Proc. NeurIPS.Google Scholar
Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron Courville. 2017. Improved Training of Wasserstein GANs. In NeurIPS.Google Scholar
David J. Heeger and James R. Bergen. 1995. Pyramid-Based Texture Analysis/Synthesis. In ACM Trans. Graph.Google Scholar
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.).Google Scholar
Xun Huang and Serge Belongie. 2017. Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization. In ICCV.Google Scholar
Xun Huang, Ming-Yu Liu, Serge Belongie, and Jan Kautz. 2018. Multimodal Unsupervised Image-to-image Translation. In ECCV.Google Scholar
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-Image Translation with Conditional Adversarial Networks. In CVPR.Google Scholar
Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In ECCV.Google Scholar
Levent Karacan, Zeynep Akata, Aykut Erdem, and Erkut Erdem. 2016. Learning to Generate Images of Outdoor Scenes from Attributes and Semantic Layouts. In Proc. NeurIPS.Google Scholar
Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila. 2020a. Training Generative Adversarial Networks with Limited Data. In Proc. NeurIPS.Google Scholar
Tero Karras, Samuli Laine, and Timo Aila. 2019. A Style-Based Generator Architecture for Generative Adversarial Networks. In CVPR.Google Scholar
Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020b. Analyzing and Improving the Image Quality of StyleGAN. In CVPR.Google Scholar
Junho Kim, Minjae Kim, Hyeonwoo Kang, and Kwang Hee Lee. 2020. U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation. In International Conference on Learning Representations.Google Scholar
Diederik P. Kingma and M. Welling. 2014. Auto-Encoding Variational Bayes. (2014).Google Scholar
Cheng-Han Lee, Ziwei Liu, Lingyun Wu, and Ping Luo. 2020. MaskGAN: Towards Diverse and Interactive Facial Image Manipulation. In CVPR.Google Scholar
Chuan Li and Michael Wand. 2016. Combining Markov Random Fields and Convolutional Neural Networks for Image Synthesis. In CVPR.Google Scholar
Jerry Li. 2018. Twin-GAN - Unpaired Cross-Domain Image Translation with Weight-Sharing GANs.Google Scholar
T. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie. 2017. Feature Pyramid Networks for Object Detection. In CVPR.Google Scholar
Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, and Jiawei Han. 2020. On the Variance of the Adaptive Learning Rate and Beyond. In Proceedings of the Eighth International Conference on Learning Representations (ICLR 2020).Google Scholar
Ming-Yu Liu, Thomas Breuel, and Jan Kautz. 2017. Unsupervised Image-to-Image Translation Networks. In Proceedings of the 31st International Conference on Neural Information Processing Systems.Google ScholarDigital Library
Ming-Yu Liu, Xun Huang, Arun Mallya, Tero Karras, Timo Aila, Jaakko Lehtinen, and Jan Kautz. 2019. Few-shot Unsueprvised Image-to-Image Translation. In CVPR.Google Scholar
Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep Learning Face Attributes in the Wild. In ICCV.Google Scholar
X. Mao, Q. Li, H. Xie, R. Y. K. Lau, Z. Wang, and S. P. Smolley. 2017. Least Squares Generative Adversarial Networks. In ICCV.Google Scholar
Sachit Menon, Alexandru Damian, Shijia Hu, Nikhil Ravi, and Cynthia Rudin. 2020. PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models. In CVPR.Google Scholar
Lars Mescheder, Sebastian Nowozin, and Andreas Geiger. 2018. Which Training Methods for GANs do actually Converge?. In International Conference on Machine Learning (ICML).Google Scholar
Justin N. M. Pinkney and Doron Adler. 2020. Resolution Dependent GAN Interpolation for Controllable Image Synthesis Between Domains. In NeurIPS Workshop.Google Scholar
pinterest 2021. pinterest. https://www.pinterest.com/.Google Scholar
Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. 2014. Stochastic Back-propagation and Approximate Inference in Deep Generative Models. In International Conference on International Conference on Machine Learning.Google Scholar
Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, and Daniel Cohen-Or. 2020. Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation. arXiv preprint arXiv:2008.00951 (2020).Google Scholar
Manuel Ruder, Alexey Dosovitskiy, and Thomas Brox. 2016. Artistic Style Transfer for Videos. In German Conference on Pattern Recognition.Google Scholar
P. Sangkloy, J. Lu, C. Fang, F. Yu, and J. Hays. 2017. Scribbler: Controlling Deep Image Synthesis with Sketch and Color. In CVPR.Google Scholar
T. R. Shaham, T. Dekel, and T. Michaeli. 2019. SinGAN: Learning a Generative Model From a Single Natural Image. In ICCV.Google Scholar
Yujun Shen and Bolei Zhou. 2020. Closed-Form Factorization of Latent Semantics in GANs. In ECCV.Google Scholar
A. Shocher, N. Cohen, and M. Irani. 2018. Zero-Shot Super-Resolution Using Deep Internal Learning. In CVPR.Google Scholar
Aliaksandr Siarohin, Stéphane Lathuilière, Sergey Tulyakov, Elisa Ricci, and Nicu Sebe. 2019. First Order Motion Model for Image Animation. In NeurIPS.Google Scholar
Guoxian Song, Jianmin Zheng, Jianfei Cai, and Tat-Jen Cham. 2020. Recovering facial reflectance and geometry from multi-view images. In Image and Vision Computing.Google Scholar
Ayush Tewari, Mohamed Elgharib, Mallikarjun B R, Florian Bernard, Hans-Peter Seidel, Patrick Pérez, Michael Zollhöfer, and Christian Theobalt. 2020. PIE: Portrait Image Embedding for Semantic Control. In ACM Trans. Graph.Google ScholarDigital Library
turbosquid 2021. turbosquid. https://www.turbosquid.com/Search/3D-Models/.Google Scholar
Ting-Chun Wang, Ming-Yu Liu, Andrew Tao, Guilin Liu, Jan Kautz, and Bryan Catanzaro. 2019. Few-shot Video-to-Video Synthesis. In NeurIPS.Google Scholar
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs. In CVPR.Google Scholar
Jonas Wulff and Antonio Torralba. 2020. Improving Inversion and Generation Diversity in StyleGAN using a Gaussianized Latent Space. In Conference on Neural Information Processing Systems.Google Scholar
L. Yuan, C. Ruan, H. Hu, and D. Chen. 2019. Image Inpainting Based on Patch-GANs. In IEEE Access.Google Scholar
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In CVPR.Google Scholar
Jiapeng Zhu, Yujun Shen, Deli Zhao, and Bolei Zhou. 2020. In-domain GAN Inversion for Real Image Editing. In ECCV.Google Scholar
Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, and Alexei A. Efros. 2016. Generative Visual Manipulation on the Natural Image Manifold. In ECCV.Google Scholar
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networkss. In ICCV.Google Scholar

Index Terms

AgileGAN: stylizing portraits by inversion-consistent transfer learning
1. Computing methodologies
  1. Computer graphics
    1. Rendering
      1. Non-photorealistic rendering

Recommendations

StyleCariGAN: caricature generation via StyleGAN feature map modulation

We present a caricature generation framework based on shape and style manipulation using StyleGAN. Our framework, dubbed StyleCariGAN, automatically creates a realistic and detailed caricature from an input photo with optional controls on shape ...
Read More
Pixar’s OUT: Experimental Look Development in the SparkShorts program
SIGGRAPH '21: ACM SIGGRAPH 2021 Talks

Pixar’s OUT, released summer 2020 on Disney+, is a short film with a highly stylized look, produced under the in-house SparkShorts program. The program champions new creative voices and storytelling via tight-knit production teams that work with ...
Read More
A Layered Authoring Tool for Stylized 3D animations
CHI '22: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems

Guided by the 12 principles of animation, stylization is a core 2D animation feature but has been utilized mainly by experienced animators. Although there are tools for stylizing 2D animations, creating stylized 3D animations remains a challenging ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Graphics Volume 40, Issue 4
August 2021
2170 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/3450626
Editor:
Sylvain Paris
Adobe Inc.
Issue’s Table of Contents
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 July 2021
Published in tog Volume 40, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
StyleGAN
image-to-image translation
portrait generation
stylization
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 34
  Total Citations
  View Citations
- 1,223
  Total Downloads
- Downloads (Last 12 months)171
- Downloads (Last 6 weeks)19
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

AgileGAN: stylizing portraits by inversion-consistent transfer learning

ACM Transactions on Graphics

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

StyleCariGAN: caricature generation via StyleGAN feature map modulation

Pixar’s OUT: Experimental Look Development in the SparkShorts program

A Layered Authoring Tool for Stylized 3D animations

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

AgileGAN: stylizing portraits by inversion-consistent transfer learning

ACM Transactions on Graphics

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

StyleCariGAN: caricature generation via StyleGAN feature map modulation

Pixar’s OUT: Experimental Look Development in the SparkShorts program

A Layered Authoring Tool for Stylized 3D animations

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media