Transforming the latent space of StyleGAN for real face editing

Li, Heyi; Liu, Jinlong; Zhang, Xinyu; Bai, Yunzhi; Wang, Huayan; Mueller, Klaus

doi:10.1007/s00371-023-03051-1

Transforming the latent space of StyleGAN for real face editing

Original article
Published: 22 August 2023

Volume 40, pages 3553–3568, (2024)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Heyi Li ORCID: orcid.org/0000-0002-2979-0849¹^na1,
Jinlong Liu²^na1,
Xinyu Zhang³,
Yunzhi Bai²,
Huayan Wang² &
…
Klaus Mueller³

374 Accesses
Explore all metrics

Abstract

Despite recent advances in semantic manipulation using StyleGAN, semantic editing of real faces remains challenging. The gap between the W space and the W+ space demands an undesirable trade-off between reconstruction quality and editing quality. To solve this problem, we propose to expand the latent space by replacing fully connected layers in StyleGAN’s mapping network with attention-based transformers. This simple and effective technique integrates the two spaces mentioned above and transforms them into one new latent space called W++. Our modified StyleGAN maintains the state-of-the-art generation quality of the original StyleGAN with moderately better diversity. But more importantly, the proposed W++ space achieves superior performance in both reconstruction quality and editing quality. Besides these significant advantages, our W++ space supports existing inversion algorithms and editing methods with only negligible modifications thanks to its structural similarity with the W/W+ space. Extensive experiments on the FFHQ dataset prove that our proposed W++ space is evidently preferable to the previous W/W+ space for real face editing. The code is publicly available for research purposes at https://github.com/AnonSubm2021/TransStyleGAN.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Image forgery detection: a survey of recent deep-learning approaches

Article Open access 03 October 2022

Deepfake: An Overview

Deepfake generation and detection, a survey

Article 08 January 2022

Data availability

The datasets generated and analyzed during the current study are available in the repository, https://github.com/AnonSubm2021/TransStyleGAN.

Notes

Our work builds upon the PyTorch implementation of StyleGANv2 by rosinality, which is publicly available at https://github.com/rosinality/stylegan2-pytorch.
The code is publicly available at https://github.com/clovaai/generative-evaluation-prdc.
The best FID score announced using rosinality’s Pytorch implementation at \(256 \times 256\) resolution is 4.5. While the best FID score we have achieved after multiple runs is 4.69 at the same resolution. However, this performance gap does not affect our findings because the training codes are identical.
The code is publicly available at https://github.com/eladrich/pixel2style2pixel.
The code is publicly available at https://github.com/genforce/interfacegan.
The code is publicly available at https://github.com/siriusdemon/pytorch-DEX.
The code is publicly available at https://github.com/sicxu/Deep3DFaceRecon_pytorch.
The code is publicly available at https://github.com/Juyong/3DFace.

References

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial networks. Commun. ACM 63(11), 139–144 (2020)
Article MathSciNet Google Scholar
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019)
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of stylegan. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8110–8119 (2020)
Shen, Y., Gu, J., Tang, X., Zhou, B.: Interpreting the latent space of gans for semantic face editing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9243–9252 (2020)
Shen, Y., Yang, C., Tang, X., Zhou, B.: Interfacegan: interpreting the disentangled face representation learned by gans. IEEE Trans. Pattern Anal. Mach. Intell. (2020)
Härkönen, E., Hertzmann, A., Lehtinen, J., Paris, S.: Ganspace: discovering interpretable gan controls. Adv. Neural Inf. Process. Syst. 33, 9841–9850 (2020)
Google Scholar
Collins, E., Bala, R., Price, B., Susstrunk, S.: Editing in style: uncovering the local semantics of gans. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5771–5780 (2020)
Shoshan, A., Bhonker, N., Kviatkovsky, I., Medioni, G.: Gan-control: explicitly controllable gans. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14083–14093 (2021)
Su, W., Ye, H., Chen, S.-Y., Gao, L., Fu, H.: Drawinginstyles: portrait image generation and editing with spatially conditioned stylegan. IEEE Trans. Vis. Comput. Graph. (2022)
Shi, Y., Yang, X., Wan, Y., Shen, X.: Semanticstylegan: learning compositional generative priors for controllable image synthesis and editing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11254–11264 (2022)
Abdal, R., Qin, Y., Wonka, P.: Image2stylegan++: how to edit the embedded images? In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8296–8305 (2020)
Zhu, J., Shen, Y., Zhao, D., Zhou, B.: In-domain gan inversion for real image editing. In: European Conference on Computer Vision, pp. 592–608. Springer (2020)
Abdal, R., Zhu, P., Mitra, N.J., Wonka, P.: Styleflow: attribute-conditioned exploration of stylegan-generated images using conditional continuous normalizing flows. ACM Trans. Graph. (ToG) 40(3), 1–21 (2021)
Article Google Scholar
Tewari, A., Elgharib, M., Bernard, F., Seidel, H.-P., Pérez, P., Zollhöfer, M., Theobalt, C.: Pie: portrait image embedding for semantic control. ACM Trans. Graph. (TOG) 39(6), 1–14 (2020)
Article Google Scholar
Hou, X., Zhang, X., Liang, H., Shen, L., Lai, Z., Wan, J.: Guidedstyle: attribute knowledge guided style manipulation for semantic face editing. Neural Netw. 145, 209–220 (2022)
Article Google Scholar
Abdal, R., Qin, Y., Wonka, P.: Image2stylegan: how to embed images into the stylegan latent space? In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4432–4441 (2019)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Richardson, E., Alaluf, Y., Patashnik, O., Nitzan, Y., Azar, Y., Shapiro, S., Cohen-Or, D.: Encoding in style: a stylegan encoder for image-to-image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2287–2296 (2021)
Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)
Xia, W., Zhang, Y., Yang, Y., Xue, J.-H., Zhou, B., Yang, M.-H.: Gan inversion: a survey. IEEE Trans. Pattern Anal. Mach. Intell. (2022)
robertluxemburg: Git repository: stylegan2encoder (2020). https://github.com/robertluxemburg/stylegan2encoder
Tov, O., Alaluf, Y., Nitzan, Y., Patashnik, O., Cohen-Or, D.: Designing an encoder for stylegan image manipulation. ACM Trans. Graph. (TOG) 40(4), 1–14 (2021)
Article Google Scholar
Alaluf, Y., Patashnik, O., Cohen-Or, D.: Restyle: a residual-based stylegan encoder via iterative refinement. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)
Roich, D., Mokady, R., Bermano, A.H., Cohen-Or, D.: Pivotal tuning for latent-based editing of real images. ACM Trans. Graph. (2021)
Alaluf, Y., Tov, O., Mokady, R., Gal, R., Bermano, A.: Hyperstyle: stylegan inversion with hypernetworks for real image editing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18511–18521 (2022)
Tewari, A., Elgharib, M., Bharaj, G., Bernard, F., Seidel, H.-P., Pérez, P., Zollhofer, M., Theobalt, C.: Stylerig: rigging stylegan for 3d control over portrait images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6142–6151 (2020)
Ju, Y., Zhang, J., Mao, X., Xu, J.: Adaptive semantic attribute decoupling for precise face image editing. Vis. Comput. 37(9), 2907–2918 (2021)
Article Google Scholar
Lin, C., Xiong, S., Lu, X.: Disentangled face editing via individual walk in personalized facial semantic field. Vis. Comput. 1–10 (2022)
Shen, Y., Zhou, B.: Closed-form factorization of latent semantics in gans. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1532–1540 (2021)
Zhu, P., Abdal, R., Qin, Y., Wonka, P.: Improved stylegan embedding: where are the good latents? arXiv preprint arXiv:2012.09036 (2020)
Liu, Y., Li, Q., Sun, Z., Tan, T.: Style intervention: How to achieve spatial disentanglement with style-based generators? arXiv preprint arXiv:2011.09699 (2020)
Wu, Z., Lischinski, D., Shechtman, E.: Stylespace analysis: disentangled controls for stylegan image generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12863–12872 (2021)
Zhang, B., Gu, S., Zhang, B., Bao, J., Chen, D., Wen, F., Wang, Y., Guo, B.: Styleswin: transformer-based gan for high-resolution image generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11304–11314 (2022)
Xu, Y., Yin, Y., Jiang, L., Wu, Q., Zheng, C., Loy, C.C., Dai, B., Wu, W.: Transeditor: transformer-based dual-space gan for highly controllable facial editing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7683–7692 (2022)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: transformers for image recognition at scale. In: International Conference on Learning Representations (2021)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. Adv. Neural Inf. Process. Syst. 30 (2017)
Kynkäänniemi, T., Karras, T., Laine, S., Lehtinen, J., Aila, T.: Improved precision and recall metric for assessing generative models. Adv. Neural Inf. Process. Syst. 32 (2019)
Naeem, M.F., Oh, S.J., Uh, Y., Choi, Y., Yoo, J.: Reliable fidelity and diversity metrics for generative models. In: International Conference on Machine Learning, pp. 7176–7185. PMLR (2020)
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 586–595 (2018)
Rothe, R., Timofte, R., Gool, L.V.: Dex: deep expectation of apparent age from a single image. In: IEEE International Conference on Computer Vision Workshops (ICCVW) (2015)
Deng, Y., Yang, J., Xu, S., Chen, D., Jia, Y., Tong, X.: Accurate 3d face reconstruction with weakly-supervised learning: from single image to image set. In: IEEE Computer Vision and Pattern Recognition Workshops (2019)
Paysan, P., Knothe, R., Amberg, B., Romdhani, S., Vetter, T.: A 3D Face Model for Pose and Illumination Invariant Face Recognition. IEEE, Genova, Italy (2009)
Book Google Scholar
Guo, Y., Zhang, J., Cai, J., Jiang, B., Zheng, J.: Cnn-based real-time dense face reconstruction with inverse-rendered photo-realistic face images. IEEE Trans. Pattern Anal. Mach. Intell. 41(6), 1294–1307 (2019)
Article Google Scholar

Download references

Acknowledgements

This research was partially supported by NSF grants IIS 1527200 and 1941613. We would like to show our gratitude to Jun Fu, Jiayi Liu, Shen Wang, Zhihang Li, and Jie Yang for their early feedback and discussions.

Author information

Heyi Li and Jinlong Liu have contributed equally to this work.

Authors and Affiliations

Institute of Remote Sensing Satellites, China Academy of Space Technology, Beijing, 100094, China
Heyi Li
Y-tech, Kuaishou Technology, Beijing, 100085, China
Jinlong Liu, Yunzhi Bai & Huayan Wang
Department of Computer Science, Stony Brook University, Engineering Dr, Stony Brook, NY, 11749, USA
Xinyu Zhang & Klaus Mueller

Authors

Heyi Li
View author publications
You can also search for this author in PubMed Google Scholar
Jinlong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xinyu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yunzhi Bai
View author publications
You can also search for this author in PubMed Google Scholar
Huayan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Klaus Mueller
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Heyi Li.

Ethics declarations

Conflict of interest

The authors declare that they do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A additional results for age and gender manipulation using InterfaeGAN

Two more examples of age transformation are provided in Figs. 10 and 11. Figures 12 and 13 exhibit two extra examples of gender transitioning. With the editing distance between the latent code and the classifying boundary increasing, edited images in the W and W+ space deteriorate significantly. For gender manipulation in the W space, in particular, the attribute race intertwines with the attribute gender resulting in the transition from white to Asian. Our proposed W++ space, on the contrary, consistently preserves untargeted attributes even for long-distance manipulation.

Appendix B Additional results for real image editing using cGAN-base pipeline

Figures 14 and 15 show more comparison results for manipulating real images regarding the attribute smile using our proposed cGAN-based editing pipeline in different latent spaces. Edited images in the W or the W+ space exhibit either limited or imperceptible effects. In contrast, our proposed W++ space successfully accomplishes the most natural smile expression and evidently outperforms both the W and the W+ space.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, H., Liu, J., Zhang, X. et al. Transforming the latent space of StyleGAN for real face editing. Vis Comput 40, 3553–3568 (2024). https://doi.org/10.1007/s00371-023-03051-1

Download citation

Accepted: 28 June 2023
Published: 22 August 2023
Issue Date: May 2024
DOI: https://doi.org/10.1007/s00371-023-03051-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Transforming the latent space of StyleGAN for real face editing

Abstract

Access this article

Similar content being viewed by others

Image forgery detection: a survey of recent deep-learning approaches

Deepfake: An Overview

Deepfake generation and detection, a survey

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix A additional results for age and gender manipulation using InterfaeGAN

Appendix B Additional results for real image editing using cGAN-base pipeline

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Transforming the latent space of StyleGAN for real face editing

Abstract

Access this article

Similar content being viewed by others

Image forgery detection: a survey of recent deep-learning approaches

Deepfake: An Overview

Deepfake generation and detection, a survey

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix A additional results for age and gender manipulation using InterfaeGAN

Appendix B Additional results for real image editing using cGAN-base pipeline

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation