research-article

Open Access

Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold

Authors:
Xingang Pan

Max Planck Institute for Informatics, Germany and Saarbrücken Research Center for Visual Computing, Interaction and AI, Germany

Max Planck Institute for Informatics, Germany and Saarbrücken Research Center for Visual Computing, Interaction and AI, Germany

0000-0002-5825-9467
View Profile

,
Ayush Tewari

Massachusetts Institute of Technology, United States of America

Massachusetts Institute of Technology, United States of America

0000-0002-3805-4421
View Profile

,
Thomas Leimkühler

Max Planck Institute for Informatics, Germany

Max Planck Institute for Informatics, Germany

0009-0006-7784-7957
View Profile

,
Lingjie Liu

Max Planck Institute for Informatics, Germany and University of Pennsylvania, USA

Max Planck Institute for Informatics, Germany and University of Pennsylvania, USA

0000-0003-4301-1474
View Profile

,
Abhimitra Meka

Google AR/VR, United States of America

Google AR/VR, United States of America

0000-0001-7906-4004
View Profile

,
Christian Theobalt

Max Planck Institute for Informatics, Germany and Saarbrücken Research Center for Visual Computing, Interaction and AI, Germany

Max Planck Institute for Informatics, Germany and Saarbrücken Research Center for Visual Computing, Interaction and AI, Germany

0000-0001-6104-6625
View Profile

SIGGRAPH '23: ACM SIGGRAPH 2023 Conference ProceedingsJuly 2023Article No.: 78Pages 1–11https://doi.org/10.1145/3588432.3591500

Published:23 July 2023Publication History

SIGGRAPH '23: ACM SIGGRAPH 2023 Conference Proceedings

Pages 1–11

ABSTRACT

Synthesizing visual content that meets users’ needs often requires flexible and precise controllability of the pose, shape, expression, and layout of the generated objects. Existing approaches gain controllability of generative adversarial networks (GANs) via manually annotated training data or a prior 3D model, which often lack flexibility, precision, and generality. In this work, we study a powerful yet much less explored way of controlling GANs, that is, to "drag" any points of the image to precisely reach target points in a user-interactive manner, as shown in Fig.1. To achieve this, we propose DragGAN, which consists of two main components: 1) a feature-based motion supervision that drives the handle point to move towards the target position, and 2) a new point tracking approach that leverages the discriminative generator features to keep localizing the position of the handle points. Through DragGAN, anyone can deform an image with precise control over where pixels go, thus manipulating the pose, shape, expression, and layout of diverse categories such as animals, cars, humans, landscapes, etc. As these manipulations are performed on the learned generative image manifold of a GAN, they tend to produce realistic outputs even for challenging scenarios such as hallucinating occluded content and deforming shapes that consistently follow the object’s rigidity. Both qualitative and quantitative comparisons demonstrate the advantage of DragGAN over prior approaches in the tasks of image manipulation and point tracking. We also showcase the manipulation of real images through GAN inversion.

Supplemental Material

papers_316_VOD.mp4

mp4

166.2 MB

Download

Available for Download

zip

Demo videos. (569.4 MB)

zip

Supplementary videos (569.4 MB)

References

Rameen Abdal, Yipeng Qin, and Peter Wonka. 2019. Image2stylegan: How to embed images into the stylegan latent space?. In ICCV.Google Scholar
Rameen Abdal, Peihao Zhu, Niloy J Mitra, and Peter Wonka. 2021. Styleflow: Attribute-conditioned exploration of stylegan-generated images using conditional continuous normalizing flows. ACM Transactions on Graphics (ToG) 40, 3 (2021), 1–21.Google ScholarDigital Library
Thomas Brox and Jitendra Malik. 2010. Large displacement optical flow: descriptor matching in variational motion estimation. IEEE transactions on pattern analysis and machine intelligence 33, 3 (2010), 500–513.Google Scholar
Eric R. Chan, Connor Z. Lin, Matthew A. Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas Guibas, Jonathan Tremblay, Sameh Khamis, Tero Karras, and Gordon Wetzstein. 2022. Efficient Geometry-aware 3D Generative Adversarial Networks. In CVPR.Google Scholar
Eric R Chan, Marco Monteiro, Petr Kellnhofer, Jiajun Wu, and Gordon Wetzstein. 2021. pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis. In CVPR.Google Scholar
Anpei Chen, Ruiyang Liu, Ling Xie, Zhang Chen, Hao Su, and Jingyi Yu. 2022. Sofgan: A portrait image generator with dynamic styling. ACM Transactions on Graphics (TOG) 41, 1 (2022), 1–26.Google ScholarDigital Library
Yunjey Choi, Youngjung Uh, Jaejun Yoo, and Jung-Woo Ha. 2020. StarGAN v2: Diverse Image Synthesis for Multiple Domains. In CVPR.Google Scholar
Edo Collins, Raja Bala, Bob Price, and Sabine Susstrunk. 2020. Editing in style: Uncovering the local semantics of gans. In CVPR. 5771–5780.Google Scholar
Antonia Creswell, Tom White, Vincent Dumoulin, Kai Arulkumaran, Biswa Sengupta, and Anil A Bharath. 2018. Generative adversarial networks: An overview. IEEE signal processing magazine 35, 1 (2018), 53–65.Google Scholar
Yu Deng, Jiaolong Yang, Dong Chen, Fang Wen, and Xin Tong. 2020. Disentangled and Controllable Face Image Generation via 3D Imitative-Contrastive Learning. In CVPR.Google Scholar
Alexey Dosovitskiy, Philipp Fischer, Eddy Ilg, Philip Hausser, Caner Hazirbas, Vladimir Golkov, Patrick Van Der Smagt, Daniel Cremers, and Thomas Brox. 2015. Flownet: Learning optical flow with convolutional networks. In ICCV.Google Scholar
Yuki Endo. 2022. User-Controllable Latent Transformer for StyleGAN Image Layout Editing. Computer Graphics Forum 41, 7 (2022), 395–406. https://doi.org/10.1111/cgf.14686Google ScholarCross Ref
Dave Epstein, Taesung Park, Richard Zhang, Eli Shechtman, and Alexei A Efros. 2022. Blobgan: Spatially disentangled scene representations. In ECCV. 616–635.Google Scholar
Jianglin Fu, Shikai Li, Yuming Jiang, Kwan-Yee Lin, Chen Qian, Chen-Change Loy, Wayne Wu, and Ziwei Liu. 2022. StyleGAN-Human: A Data-Centric Odyssey of Human Generation. In ECCV.Google Scholar
Partha Ghosh, Pravir Singh Gupta, Roy Uziel, Anurag Ranjan, Michael J Black, and Timo Bolkart. 2020. GIF: Generative interpretable faces. In International Conference on 3D Vision (3DV).Google ScholarCross Ref
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In NeurIPS.Google Scholar
Jiatao Gu, Lingjie Liu, Peng Wang, and Christian Theobalt. 2022. StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis. In ICLR.Google Scholar
Erik Härkönen, Aaron Hertzmann, Jaakko Lehtinen, and Sylvain Paris. 2020. GANSpace: Discovering Interpretable GAN Controls. arXiv preprint arXiv:2004.02546 (2020).Google Scholar
Adam W. Harley, Zhaoyuan Fang, and Katerina Fragkiadaki. 2022. Particle Video Revisited: Tracking Through Occlusions Using Point Trajectories. In ECCV.Google Scholar
Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. In NeurIPS.Google Scholar
Takeo Igarashi, Tomer Moscovich, and John F Hughes. 2005. As-rigid-as-possible shape manipulation. ACM transactions on Graphics (TOG) 24, 3 (2005), 1134–1141.Google Scholar
Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, and Thomas Brox. 2017. Flownet 2.0: Evolution of optical flow estimation with deep networks. In CVPR.Google Scholar
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-image translation with conditional adversarial networks. In CVPR.Google Scholar
Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2021. Alias-Free Generative Adversarial Networks. In NeurIPS.Google Scholar
Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In CVPR. 4401–4410.Google Scholar
Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020. Analyzing and improving the image quality of stylegan. In CVPR. 8110–8119.Google Scholar
Davis E. King. 2009. Dlib-ml: A Machine Learning Toolkit. Journal of Machine Learning Research 10 (2009), 1755–1758.Google ScholarDigital Library
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
Thomas Leimkühler and George Drettakis. 2021. FreeStyleGAN: Free-view Editable Portrait Rendering with the Camera Manifold. 40, 6 (2021). https://doi.org/10.1145/3478513.3480538Google ScholarDigital Library
Huan Ling, Karsten Kreis, Daiqing Li, Seung Wook Kim, Antonio Torralba, and Sanja Fidler. 2021. Editgan: High-precision semantic image editing. In NeurIPS.Google Scholar
Ron Mokady, Omer Tov, Michal Yarom, Oran Lang, Inbar Mosseri, Tali Dekel, Daniel Cohen-Or, and Michal Irani. 2022. Self-distilled stylegan: Towards generation from internet photos. In ACM SIGGRAPH 2022 Conference Proceedings. 1–9.Google ScholarDigital Library
Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. 2019. Semantic image synthesis with spatially-adaptive normalization. In CVPR.Google Scholar
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. (2017).Google Scholar
Or Patashnik, Zongze Wu, Eli Shechtman, Daniel Cohen-Or, and Dani Lischinski. 2021. Styleclip: Text-driven manipulation of stylegan imagery. In ICCV.Google Scholar
Justin N. M. Pinkney. 2020. Awesome pretrained StyleGAN2. https://github.com/justinpinkney/awesome-pretrained-stylegan2.Google Scholar
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. 2022. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 (2022).Google Scholar
Daniel Roich, Ron Mokady, Amit H Bermano, and Daniel Cohen-Or. 2022. Pivotal tuning for latent-based editing of real images. ACM Transactions on Graphics (TOG) 42, 1 (2022), 1–13.Google ScholarDigital Library
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2021. High-Resolution Image Synthesis with Latent Diffusion Models. arxiv:2112.10752 [cs.CV]Google Scholar
Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S Sara Mahdavi, Rapha Gontijo Lopes, 2022. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. arXiv preprint arXiv:2205.11487 (2022).Google Scholar
Katja Schwarz, Yiyi Liao, Michael Niemeyer, and Andreas Geiger. 2020. GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis. In NeurIPS.Google Scholar
Yujun Shen, Jinjin Gu, Xiaoou Tang, and Bolei Zhou. 2020. Interpreting the latent space of gans for semantic face editing. In CVPR.Google Scholar
Yujun Shen and Bolei Zhou. 2020. Closed-Form Factorization of Latent Semantics in GANs. arXiv preprint arXiv:2007.06600 (2020).Google Scholar
Ivan Skorokhodov, Grigorii Sotnikov, and Mohamed Elhoseiny. 2021. Aligning Latent and Image Spaces to Connect the Unconnectable. arXiv preprint arXiv:2104.06954 (2021).Google Scholar
Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. 2015. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning. PMLR, 2256–2265.Google Scholar
Jiaming Song, Chenlin Meng, and Stefano Ermon. 2020. Denoising Diffusion Implicit Models. In ICLR.Google Scholar
Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. 2021. Score-Based Generative Modeling through Stochastic Differential Equations. In International Conference on Learning Representations.Google Scholar
Narayanan Sundaram, Thomas Brox, and Kurt Keutzer. 2010. Dense point trajectories by gpu-accelerated large displacement optical flow. In ECCV.Google Scholar
Ryohei Suzuki, Masanori Koyama, Takeru Miyato, Taizan Yonetsuji, and Huachun Zhu. 2018. Spatially controllable image synthesis with internal representation collaging. arXiv preprint arXiv:1811.10153 (2018).Google Scholar
Zachary Teed and Jia Deng. 2020. Raft: Recurrent all-pairs field transforms for optical flow. In ECCV.Google Scholar
Ayush Tewari, Mohamed Elgharib, Gaurav Bharaj, Florian Bernard, Hans-Peter Seidel, Patrick Pérez, Michael Zollhofer, and Christian Theobalt. 2020. StyleRig: Rigging StyleGAN for 3D Control over Portrait Images. In CVPR.Google Scholar
Nontawat Tritrong, Pitchaporn Rewatbowornwong, and Supasorn Suwajanakorn. 2021. Repurposing gans for one-shot semantic part segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4475–4485.Google ScholarCross Ref
Jianyuan Wang, Ceyuan Yang, Yinghao Xu, Yujun Shen, Hongdong Li, and Bolei Zhou. 2022b. Improving gan equilibrium by raising spatial awareness. In CVPR. 11285–11293.Google Scholar
Sheng-Yu Wang, David Bau, and Jun-Yan Zhu. 2022a. Rewriting Geometric Rules of a GAN. ACM Transactions on Graphics (TOG) (2022).Google Scholar
Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao. 2015. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365 (2015).Google Scholar
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In CVPR.Google Scholar
Yuxuan Zhang, Huan Ling, Jun Gao, Kangxue Yin, Jean-Francois Lafleche, Adela Barriuso, Antonio Torralba, and Sanja Fidler. 2021. DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort. In CVPR.Google Scholar
Jiapeng Zhu, Ceyuan Yang, Yujun Shen, Zifan Shi, Deli Zhao, and Qifeng Chen. 2023. LinkGAN: Linking GAN Latents to Pixels for Controllable Image Synthesis. arXiv preprint arXiv:2301.04604 (2023).Google Scholar
Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, and Alexei A Efros. 2016. Generative visual manipulation on the natural image manifold. In ECCV.Google Scholar

Index Terms

Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

StyleGAN-based CLIP-guided Image Shape Manipulation
CBMI '22: Proceedings of the 19th International Conference on Content-based Multimedia Indexing

In this paper, we propose a text-guided image manipulation method which focuses on editing shape attribute using text description. We combine an image generation model, StyleGAN2, and image-text matching model, CLIP, and we have achieved the goal of ...
Read More
Visual object tracking--classical and contemporary approaches

Visual object tracking (VOT) is an important subfield of computer vision. It has widespread application domains, and has been considered as an important part of surveillance and security system. VOA facilitates finding the position of target in image ...
Read More
Semi-supervised Generative Adversarial Hashing for Image Retrieval
Computer Vision – ECCV 2018
Abstract
With explosive growth of image and video data on the Internet, hashing technique has been extensively studied for large-scale visual search. Benefiting from the advance of deep learning, deep hashing methods have achieved promising performance. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGGRAPH '23: ACM SIGGRAPH 2023 Conference Proceedings
July 2023
911 pages
ISBN:9798400701597
DOI:10.1145/3588432
Editors:
Erik Brunvand,
Alla Sheffer,
Michael Wimmer
Copyright © 2023 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 July 2023
Check for updates
Author Tags
GANs
interactive image manipulation
point tracking
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate1,822of8,601submissions,21%
Upcoming Conference
SIGGRAPH '24

Sponsor:

siggraph

Special Interest Group on Computer Graphics and Interactive Techniques Conference

July 27 - August 1, 2024

Denver , CO , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 1,742
  Total Downloads
- Downloads (Last 12 months)1,742
- Downloads (Last 6 weeks)257
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold

SIGGRAPH '23: ACM SIGGRAPH 2023 Conference Proceedings

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

StyleGAN-based CLIP-guided Image Shape Manipulation

Visual object tracking--classical and contemporary approaches

Semi-supervised Generative Adversarial Hashing for Image Retrieval

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold

SIGGRAPH '23: ACM SIGGRAPH 2023 Conference Proceedings

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

StyleGAN-based CLIP-guided Image Shape Manipulation

Visual object tracking--classical and contemporary approaches

Semi-supervised Generative Adversarial Hashing for Image Retrieval

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media