High-Fidelity GAN Inversion with Padding Space

Bai, Qingyan; Xu, Yinghao; Zhu, Jiapeng; Xia, Weihao; Yang, Yujiu; Shen, Yujun

doi:10.1007/978-3-031-19784-0_3

Qingyan Bai¹²,
Yinghao Xu¹³,
Jiapeng Zhu¹⁴,
Weihao Xia¹⁵,
Yujiu Yang¹² &
…
Yujun Shen¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13675))

Included in the following conference series:

European Conference on Computer Vision

2176 Accesses
12 Citations

Abstract

Inverting a Generative Adversarial Network (GAN) facilitates a wide range of image editing tasks using pre-trained generators. Existing methods typically employ the latent space of GANs as the inversion space yet observe the insufficient recovery of spatial details. In this work, we propose to involve the padding space of the generator to complement the latent space with spatial information. Concretely, we replace the constant padding (e.g., usually zeros) used in convolution layers with some instance-aware coefficients. In this way, the inductive bias assumed in the pre-trained model can be appropriately adapted to fit each individual image. Through learning a carefully designed encoder, we manage to improve the inversion quality both qualitatively and quantitatively, outperforming existing alternatives. We then demonstrate that such a space extension barely affects the native GAN manifold, hence we can still reuse the prior knowledge learned by GANs for various downstream applications. Beyond the editing tasks explored in prior arts, our approach allows a more flexible image manipulation, such as the separate control of face contour and facial details, and enables a novel editing manner where users can customize their own manipulations highly efficiently. (Project page can be found here.)

Q. Bai and Y. Xu—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We empirically verify that 32 is the best choice in Sect. 4.2.

References

Abdal, R., Qin, Y., Wonka, P.: Image2StyleGAN++: how to edit the embedded images? In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8296–8305 (2020)
Google Scholar
Alaluf, Y., Patashnik, O., Cohen-Or, D.: ReStyle: a residual-based StyleGAN encoder via iterative refinement. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6711–6720 (2021)
Google Scholar
Alaluf, Y., Tov, O., Mokady, R., Gal, R., Bermano, A.: HyperStyle: StyleGAN inversion with hypernetworks for real image editing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18511–18521 (2022)
Google Scholar
Alsallakh, B., Kokhlikyan, N., Miglani, V., Yuan, J., Reblitz-Richardson, O.: Mind the Pad – CNNs can develop blind spots. In: International Conference on Learning Representations (2021)
Google Scholar
Bau, D., et al.: Paint by word. arXiv preprint arXiv:2103.10951 (2021)
Bau, D., et al.: GAN dissection: visualizing and understanding generative adversarial networks. In: International Conference on Learning Representations (2019)
Google Scholar
Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. In: International Conference on Learning Representations (2019)
Google Scholar
Creswell, A., Bharath, A.A.: Inverting the generator of a generative adversarial network. IEEE Trans. Neural Netw. Learn. Syst. 30(7), 1967–1974 (2018)
Article Google Scholar
Deng, J., Guo, J., Xue, N., Zafeiriou, S.: ArcFace: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4690–4699 (2019)
Google Scholar
Dinh, T.M., Tran, A.T., Nguyen, R., Hua, B.S.: HyperInverter: improving StyleGAN inversion via hypernetwork. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11389–11398 (2022)
Google Scholar
Donahue, J., Simonyan, K.: Large scale adversarial representation learning. Adv. Neural Inform. Process. Syst. 32 (2019)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. Adv. Neural Inform. Process. Syst. (2014)
Google Scholar
Gu, J., Shen, Y., Zhou, B.: Image processing using multi-code GAN prior. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3012–3021 (2020)
Google Scholar
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of wasserstein GANs. Adv. Neural Inform. Process. Syst. 30 (2017)
Google Scholar
Härkönen, E., Hertzmann, A., Lehtinen, J., Paris, S.: GANSpace: discovering interpretable GAN controls. Adv. Neural Inform. Process. Syst. 33, 9841–9850 (2020)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Huang, X., Belongie, S.: Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1501–1510 (2017)
Google Scholar
Huh, M., Zhang, R., Zhu, J.-Y., Paris, S., Hertzmann, A.: Transforming and projecting images into class-conditional generative networks. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 17–34. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_2
Chapter Google Scholar
Islam, M.A., Jia, S., Bruce, N.D.: How much position information do convolutional neural networks encode? arXiv preprint arXiv:2001.08248 (2020)
Islam, M.A., Kowal, M., Jia, S., Derpanis, K.G., Bruce, N.D.: Position, padding and predictions: a deeper look at position information in CNNs. arXiv preprint arXiv:2101.12322 (2021)
Jahanian, A., Chai, L., Isola, P.: On the “steerability” of generative adversarial networks. In: International Conference on Learning Representations (2020)
Google Scholar
Kang, K., Kim, S., Cho, S.: GAN inversion for out-of-range images with geometric transformations. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13941–13949 (2021)
Google Scholar
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. In: International Conference on Learning Representations (2018)
Google Scholar
Karras, T., et al.: Alias-free generative adversarial networks. Adv. Neural Inform. Process. Syst. 34, 852–863 (2021)
Google Scholar
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019)
Google Scholar
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of StyleGAN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8110–8119 (2020)
Google Scholar
Kayhan, O.S., Gemert, J.C.V.: On translation invariance in CNNs: convolutional layers can exploit absolute spatial location. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14274–14285 (2020)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Google Scholar
Ling, H., Kreis, K., Li, D., Kim, S.W., Torralba, A., Fidler, S.: EditGAN: high-precision semantic image editing. Adv. Neural Inform. Process. Syst. 34, 16331–16345 (2021)
Google Scholar
Lipton, Z.C., Tripathi, S.: Precise recovery of latent vectors from generative adversarial networks. In: International Conference on Learning Representations (2017)
Google Scholar
Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3730–3738 (2015)
Google Scholar
Pan, X., Zhan, X., Dai, B., Lin, D., Loy, C.C., Luo, P.: Exploiting deep generative prior for versatile image restoration and manipulation. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2020)
Google Scholar
Perarnau, G., Van De Weijer, J., Raducanu, B., Álvarez, J.M.: Invertible conditional GANs for image editing. In: Advances in Neural Information Processing Systems (2016)
Google Scholar
Pidhorskyi, S., Adjeroh, D.A., Doretto, G.: Adversarial latent autoencoders. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14104–14113 (2020)
Google Scholar
Rameen, A., Yipeng, Q., Peter, W.: Image2StyleGAN: how to embed images into the StyleGAN latent space? In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4432–4441 (2019)
Google Scholar
Richardson, E., et al.: Encoding in style: a StyleGAN encoder for image-to-image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2287–2296 (2021)
Google Scholar
Roich, D., Mokady, R., Bermano, A.H., Cohen-Or, D.: Pivotal tuning for latent-based editing of real images. arXiv preprint arXiv:2106.05744 (2021)
Shen, Y., Gu, J., Tang, X., Zhou, B.: Interpreting the latent space of GANs for semantic face editing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9243–9252 (2020)
Google Scholar
Shen, Y., Yang, C., Tang, X., Zhou, B.: InterFaceGAN: interpreting the disentangled face representation learned by GANs. IEEE Trans. Pattern Anal. Mach. Intell. (2020)
Google Scholar
Shen, Y., Zhang, Z., Yang, D., Xu, Y., Yang, C., Zhu, J.: Hammer: an efficient toolkit for training deep models (2022). https://github.com/bytedance/Hammer
Tov, O., Alaluf, Y., Nitzan, Y., Patashnik, O., Cohen-Or, D.: Designing an encoder for StyleGAN image manipulation. ACM Trans. Graph. 40(4), 1–14 (2021)
Article Google Scholar
Wang, T., Zhang, Y., Fan, Y., Wang, J., Chen, Q.: High-fidelity GAN inversion for image attribute editing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11379–11388 (2022)
Google Scholar
Wu, Z., Lischinski, D., Shechtman, E.: StyleSpace analysis: disentangled controls for StyleGAN image generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12863–12872 (2021)
Google Scholar
Xia, W., Zhang, Y., Yang, Y., Xue, J.H., Zhou, B., Yang, M.H.: GAN inversion: a survey. IEEE Trans. Pattern Anal. Mach. Intell. (2022)
Google Scholar
Xu, R., Wang, X., Chen, K., Zhou, B., Loy, C.C.: Positional encoding as spatial inductive bias in GANs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13569–13578 (2021)
Google Scholar
Xu, Y., Shen, Y., Zhu, J., Yang, C., Zhou, B.: Generative hierarchical features from synthesizing images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4432–4442 (2021)
Google Scholar
Yang, C., Shen, Y., Zhou, B.: Semantic hierarchy emerges in deep generative representations for scene synthesis. Int. J. Comput. Vis. 129(5), 1451–1466 (2021). https://doi.org/10.1007/s11263-020-01429-5
Article Google Scholar
Yu, F., Seff, A., Zhang, Y., Song, S., Funkhouser, T., Xiao, J.: LSUN: construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365 (2015)
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 586–595 (2018)
Google Scholar
Zhang, Y., et al.: DatasetGAN: efficient labeled data factory with minimal human effort. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10145–10155 (2021)
Google Scholar
Zhu, J., et al.: Low-rank subspaces in GANs. Adv. Neural Inform. Process. Syst. 34, 16648–16658 (2021)
Google Scholar
Zhu, J., Shen, Y., Xu, Y., Zhao, D., Chen, Q.: Region-based semantic factorization in GANs. In: International Conference on Machine Learning (2022)
Google Scholar
Zhu, J., Shen, Y., Zhao, D., Zhou, B.: In-domain GAN inversion for real image editing. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12362, pp. 592–608. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58520-4_35
Chapter Google Scholar
Zhu, J., Zhao, D., Zhang, B., Zhou, B.: Disentangled inference for GANs with latently invertible autoencoder. Int. J. Comput. Vis. 130, 1259–1276 (2022). https://doi.org/10.1007/s11263-022-01598-5
Article Google Scholar
Zhu, J.-Y., Krähenbühl, P., Shechtman, E., Efros, A.A.: Generative visual manipulation on the natural image manifold. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 597–613. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_36
Chapter Google Scholar
Zhu, P., Abdal, R., Femiani, J., Wonka, P.: Barbershop: GAN-based image compositing using segmentation masks. arXiv preprint arXiv:2106.01505 (2021)

Download references

Acknowledgement

This work was supported in part by the National Natural Science Foundation of China (Grant No. 61991450) and the Shenzhen Key Laboratory of Marine IntelliSense and Computation (ZDSYS20200811142605016). We thank Zhiyi Zhang for the technical support.

Author information

Authors and Affiliations

Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
Qingyan Bai & Yujiu Yang
CUHK, Shatin, Hong Kong
Yinghao Xu
HKUST, Sai Kung, Hong Kong
Jiapeng Zhu
University College London, London, UK
Weihao Xia
ByteDance Inc., Shenzhen, China
Yujun Shen

Authors

Qingyan Bai
View author publications
You can also search for this author in PubMed Google Scholar
Yinghao Xu
View author publications
You can also search for this author in PubMed Google Scholar
Jiapeng Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Weihao Xia
View author publications
You can also search for this author in PubMed Google Scholar
Yujiu Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yujun Shen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yujiu Yang .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 13472 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bai, Q., Xu, Y., Zhu, J., Xia, W., Yang, Y., Shen, Y. (2022). High-Fidelity GAN Inversion with Padding Space. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13675. Springer, Cham. https://doi.org/10.1007/978-3-031-19784-0_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-19784-0_3
Published: 31 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19783-3
Online ISBN: 978-3-031-19784-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics