skip to main content
10.1145/3460426.3463611acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article
Open Access

Image-to-Image Transfer Makes Chaos to Order

Authors Info & Claims
Published:01 September 2021Publication History

ABSTRACT

GAN-based image-to-image transfer tools have achieved remarkable results in image generation. However, most of the research efforts focus on changing the style features, e.g., color and texture. The spatial features, e.g., the locations of objects in input and output images, always keep consistent. If the above tools are employed to translate locations, such as organizing objects from a chaotic scene to an orderly scene in images (i.e., chaos to order), can these tools work well? Therefore, we investigate the issue of image-to-image location transfer and receive a preliminary conclusion that it is hard to manipulate spatial features of objects in raw images automatically. In this paper, we propose a novel framework called LT-GAN to address the above issue. Specifically, a multi-stage generation structure is designed, where the location translation is performed based on semantic labels as a bridge to enhance the effect of automatically manipulate the spatial features of raw images. Experimental results demonstrate the effectiveness of the proposed multi-stage generation strategy. Meanwhile, a Color Histogram Loss is explored to evaluate the similarity of color distribution between a chaotic scene image and the corresponding orderly scene image. The quality of orderly scene images generated by the final stage is improved significantly in LT-GAN by using the combination of feature extraction and the Color Histogram Loss. Moreover, to break through the limitation of public datasets in image-to-image transfer tasks, a new dataset named M2C is constructed for this new application scenario of location transfer, including more than 15,000 paired images and the corresponding semantic labels in total. The dataset is available at \urlhttps://drive.google.com/open?id=1amr9ga9wvhnIzeZ48OHbLapHGqOb4-Up

References

  1. Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In ECCV. 801--818.Google ScholarGoogle Scholar
  2. Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In CVPR. 3213--3223.Google ScholarGoogle Scholar
  3. Brian Dolhansky and Cristian Canton Ferrer. 2018. Eye in-painting with exemplar generative adversarial networks. In CVPR. 7902--7911.Google ScholarGoogle Scholar
  4. Alexei A Efros and Thomas K Leung. 1999. Texture synthesis by non-parametric sampling. In ICCV, Vol. 2. IEEE, 1033--1038.Google ScholarGoogle ScholarCross RefCross Ref
  5. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In NIPS. 2672--2680.Google ScholarGoogle Scholar
  6. Aaron Hertzmann, Charles E Jacobs, Nuria Oliver, Brian Curless, and David H Salesin. 2001. Image analogies. In SIGGRAPH. ACM, 327--340.Google ScholarGoogle Scholar
  7. Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NIPS. 6626--6637.Google ScholarGoogle Scholar
  8. Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-toimage translation with conditional adversarial networks. In CVPR. 1125--1134.Google ScholarGoogle Scholar
  9. Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2017. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017).Google ScholarGoogle Scholar
  10. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google ScholarGoogle Scholar
  11. Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. 2017. Photo-realistic single image super-resolution using a generative adversarial network. In CVPR. 4681--4690.Google ScholarGoogle Scholar
  12. Chuan Li and Michael Wand. 2016. Precomputed real-time texture synthesis with markovian generative adversarial networks. In ECCV. Springer, 702--716.Google ScholarGoogle Scholar
  13. Jianan Li, Xiaodan Liang, Yunchao Wei, Tingfa Xu, Jiashi Feng, and Shuicheng Yan. 2017. Perceptual generative adversarial networks for small object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1222--1230.Google ScholarGoogle ScholarCross RefCross Ref
  14. Si Liu, Yao Sun, Defa Zhu, Renda Bao, Wei Wang, Xiangbo Shu, and Shuicheng Yan. 2017. Face aging with contextual generative adversarial nets. In Proceedings of the 25th ACM international conference on Multimedia. ACM, 82--90.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In CVPR. 3431--3440.Google ScholarGoogle Scholar
  16. Pauline Luc, Camille Couprie, Soumith Chintala, and Jakob Verbeek. 2016. Semantic segmentation using adversarial networks. arXiv preprint arXiv:1611.08408 (2016).Google ScholarGoogle Scholar
  17. Eunbyung Park, Jimei Yang, Ersin Yumer, Duygu Ceylan, and Alexander C Berg. 2017. Transformation-grounded image generation network for novel 3d view synthesis. In CVPR. 3500--3509.Google ScholarGoogle Scholar
  18. Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. 2016. Generative adversarial text to image synthesis. arXiv preprint arXiv:1605.05396 (2016).Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Zhixin Shu, Ersin Yumer, Sunil Hadap, Kalyan Sunkavalli, Eli Shechtman, and Dimitris Samaras. 2017. Neural face editing with intrinsic image disentangling. In CVPR. 5541--5550.Google ScholarGoogle Scholar
  20. Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. 2012. Indoor segmentation and support inference from rgbd images. In ECCV. Springer, 746-- 760.Google ScholarGoogle Scholar
  21. Sergey Tulyakov, Ming-Yu Liu, Xiaodong Yang, and Jan Kautz. 2018. Mocogan: Decomposing motion and content for video generation. In CVPR. 1526--1535.Google ScholarGoogle Scholar
  22. Evgeniya Ustinova and Victor Lempitsky. 2016. Learning deep embeddings with histogram loss. In NIPS. 4170--4178.Google ScholarGoogle Scholar
  23. Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. High-resolution image synthesis and semantic manipulation with conditional gans. In CVPR. 8798--8807.Google ScholarGoogle Scholar
  24. Xiaolong Wang, Abhinav Shrivastava, and Abhinav Gupta. 2017. A-fast-rcnn: Hard positive generation via adversary for object detection. In CVPR. 2606--2615.Google ScholarGoogle Scholar
  25. Jiajun Wu, Chengkai Zhang, Tianfan Xue, Bill Freeman, and Josh Tenenbaum. 2016. Learning a probabilistic latent space of object shapes via 3d generative adversarial modeling. In NIPS. 82--90.Google ScholarGoogle Scholar
  26. Qizhe Xie, Zihang Dai, Yulun Du, Eduard Hovy, and Graham Neubig. 2017. Controllable in variance through adversarial feature learning. In NIPS. 585--596.Google ScholarGoogle Scholar
  27. Li-Chia Yang, Szu-Yu Chou, and Yi-Hsuan Yang. 2017. MidiNet: A convolutional generative adversarial network for symbolic-domain music generation. arXiv preprint arXiv:1703.10847 (2017).Google ScholarGoogle Scholar
  28. Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, and Dimitris N Metaxas. 2017. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In ICCV. 5907--5915.Google ScholarGoogle Scholar
  29. Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. 2017. Scene parsing through ade20k dataset. In CVPR. 633--641.Google ScholarGoogle Scholar
  30. Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In ICCV. 2223--2232.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    ICMR '21: Proceedings of the 2021 International Conference on Multimedia Retrieval
    August 2021
    715 pages
    ISBN:9781450384636
    DOI:10.1145/3460426

    Copyright © 2021 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 1 September 2021

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate254of830submissions,31%

    Upcoming Conference

    ICMR '24
    International Conference on Multimedia Retrieval
    June 10 - 14, 2024
    Phuket , Thailand

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader