Image-to-Image Transfer Makes Chaos to Order

Authors:
Sanbi Luo

Institute of Information Engineering, Chinese Academy of Sciences & University of Chinese Academy of Sciences, Beijing, China

Institute of Information Engineering, Chinese Academy of Sciences & University of Chinese Academy of Sciences, Beijing, China
View Profile

,
Tao Guo

Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China

Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
View Profile

ICMR '21: Proceedings of the 2021 International Conference on Multimedia RetrievalAugust 2021Pages 236–243https://doi.org/10.1145/3460426.3463611

Published:01 September 2021Publication History

ICMR '21: Proceedings of the 2021 International Conference on Multimedia Retrieval

Pages 236–243

ABSTRACT

GAN-based image-to-image transfer tools have achieved remarkable results in image generation. However, most of the research efforts focus on changing the style features, e.g., color and texture. The spatial features, e.g., the locations of objects in input and output images, always keep consistent. If the above tools are employed to translate locations, such as organizing objects from a chaotic scene to an orderly scene in images (i.e., chaos to order), can these tools work well? Therefore, we investigate the issue of image-to-image location transfer and receive a preliminary conclusion that it is hard to manipulate spatial features of objects in raw images automatically. In this paper, we propose a novel framework called LT-GAN to address the above issue. Specifically, a multi-stage generation structure is designed, where the location translation is performed based on semantic labels as a bridge to enhance the effect of automatically manipulate the spatial features of raw images. Experimental results demonstrate the effectiveness of the proposed multi-stage generation strategy. Meanwhile, a Color Histogram Loss is explored to evaluate the similarity of color distribution between a chaotic scene image and the corresponding orderly scene image. The quality of orderly scene images generated by the final stage is improved significantly in LT-GAN by using the combination of feature extraction and the Color Histogram Loss. Moreover, to break through the limitation of public datasets in image-to-image transfer tasks, a new dataset named M2C is constructed for this new application scenario of location transfer, including more than 15,000 paired images and the corresponding semantic labels in total. The dataset is available at \urlhttps://drive.google.com/open?id=1amr9ga9wvhnIzeZ48OHbLapHGqOb4-Up

References

Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In ECCV. 801--818.Google Scholar
Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In CVPR. 3213--3223.Google Scholar
Brian Dolhansky and Cristian Canton Ferrer. 2018. Eye in-painting with exemplar generative adversarial networks. In CVPR. 7902--7911.Google Scholar
Alexei A Efros and Thomas K Leung. 1999. Texture synthesis by non-parametric sampling. In ICCV, Vol. 2. IEEE, 1033--1038.Google ScholarCross Ref
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In NIPS. 2672--2680.Google Scholar
Aaron Hertzmann, Charles E Jacobs, Nuria Oliver, Brian Curless, and David H Salesin. 2001. Image analogies. In SIGGRAPH. ACM, 327--340.Google Scholar
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NIPS. 6626--6637.Google Scholar
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-toimage translation with conditional adversarial networks. In CVPR. 1125--1134.Google Scholar
Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2017. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017).Google Scholar
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. 2017. Photo-realistic single image super-resolution using a generative adversarial network. In CVPR. 4681--4690.Google Scholar
Chuan Li and Michael Wand. 2016. Precomputed real-time texture synthesis with markovian generative adversarial networks. In ECCV. Springer, 702--716.Google Scholar
Jianan Li, Xiaodan Liang, Yunchao Wei, Tingfa Xu, Jiashi Feng, and Shuicheng Yan. 2017. Perceptual generative adversarial networks for small object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1222--1230.Google ScholarCross Ref
Si Liu, Yao Sun, Defa Zhu, Renda Bao, Wei Wang, Xiangbo Shu, and Shuicheng Yan. 2017. Face aging with contextual generative adversarial nets. In Proceedings of the 25th ACM international conference on Multimedia. ACM, 82--90.Google ScholarDigital Library
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In CVPR. 3431--3440.Google Scholar
Pauline Luc, Camille Couprie, Soumith Chintala, and Jakob Verbeek. 2016. Semantic segmentation using adversarial networks. arXiv preprint arXiv:1611.08408 (2016).Google Scholar
Eunbyung Park, Jimei Yang, Ersin Yumer, Duygu Ceylan, and Alexander C Berg. 2017. Transformation-grounded image generation network for novel 3d view synthesis. In CVPR. 3500--3509.Google Scholar
Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. 2016. Generative adversarial text to image synthesis. arXiv preprint arXiv:1605.05396 (2016).Google ScholarDigital Library
Zhixin Shu, Ersin Yumer, Sunil Hadap, Kalyan Sunkavalli, Eli Shechtman, and Dimitris Samaras. 2017. Neural face editing with intrinsic image disentangling. In CVPR. 5541--5550.Google Scholar
Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. 2012. Indoor segmentation and support inference from rgbd images. In ECCV. Springer, 746-- 760.Google Scholar
Sergey Tulyakov, Ming-Yu Liu, Xiaodong Yang, and Jan Kautz. 2018. Mocogan: Decomposing motion and content for video generation. In CVPR. 1526--1535.Google Scholar
Evgeniya Ustinova and Victor Lempitsky. 2016. Learning deep embeddings with histogram loss. In NIPS. 4170--4178.Google Scholar
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. High-resolution image synthesis and semantic manipulation with conditional gans. In CVPR. 8798--8807.Google Scholar
Xiaolong Wang, Abhinav Shrivastava, and Abhinav Gupta. 2017. A-fast-rcnn: Hard positive generation via adversary for object detection. In CVPR. 2606--2615.Google Scholar
Jiajun Wu, Chengkai Zhang, Tianfan Xue, Bill Freeman, and Josh Tenenbaum. 2016. Learning a probabilistic latent space of object shapes via 3d generative adversarial modeling. In NIPS. 82--90.Google Scholar
Qizhe Xie, Zihang Dai, Yulun Du, Eduard Hovy, and Graham Neubig. 2017. Controllable in variance through adversarial feature learning. In NIPS. 585--596.Google Scholar
Li-Chia Yang, Szu-Yu Chou, and Yi-Hsuan Yang. 2017. MidiNet: A convolutional generative adversarial network for symbolic-domain music generation. arXiv preprint arXiv:1703.10847 (2017).Google Scholar
Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, and Dimitris N Metaxas. 2017. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In ICCV. 5907--5915.Google Scholar
Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. 2017. Scene parsing through ade20k dataset. In CVPR. 633--641.Google Scholar
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In ICCV. 2223--2232.Google Scholar

Recommendations

Global Relation-Aware Attention Network for Image-Text Retrieval
ICMR '21: Proceedings of the 2021 International Conference on Multimedia Retrieval

The cross-modal image-text retrieval has attracted extensive attention in recent years, which contributes to the development of search engine. Fine-grained features and cross-attention have been widely used in past researches to reach the goal of cross-...
Read More
Generative Adversarial Networks with Bi-directional Normalization for Semantic Image Synthesis
ICMR '21: Proceedings of the 2021 International Conference on Multimedia Retrieval

Semantic image synthesis aims at translating semantic label maps to photo-realistic images. However, most of previous methods easily generate blurred regions and artifacts, and the quality of these images is far from realistic. There are two unresolved ...
Read More
MLFont: Few-Shot Chinese Font Generation via Deep Meta-Learning
ICMR '21: Proceedings of the 2021 International Conference on Multimedia Retrieval

The automatic generation of Chinese fonts is challenging due to the large quantity and complex structure of Chinese characters. When there are insufficient reference samples for the target font, existing deep learning-based methods cannot avoid ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMR '21: Proceedings of the 2021 International Conference on Multimedia Retrieval
August 2021
715 pages
ISBN:9781450384636
DOI:10.1145/3460426
General Chairs:
Wen-Huang Cheng
National Yang Ming Chiao Tung University, Taiwan
,
Mohan Kankanhalli
National University of Singapore, Singapore
,
Meng Wang
Hefei University of Technology, China
,
Program Chairs:
Wei-Ta Chu
National Cheng Kung University, Taiwan
,
Jiaying Liu
Peking University, China
,
Marcel Worring
University of Amsterdam, Netherlands
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 September 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
datasets
generative adversarial networks
image-to-image
location transfer
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate254of830submissions,31%
Upcoming Conference
ICMR '24

Sponsor:

sigmm

International Conference on Multimedia Retrieval

June 10 - 14, 2024

Phuket , Thailand
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 375
  Total Downloads
- Downloads (Last 12 months)69
- Downloads (Last 6 weeks)13
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Image-to-Image Transfer Makes Chaos to Order

ICMR '21: Proceedings of the 2021 International Conference on Multimedia Retrieval

ABSTRACT

References

Cited By

Recommendations

Global Relation-Aware Attention Network for Image-Text Retrieval

Generative Adversarial Networks with Bi-directional Normalization for Semantic Image Synthesis

MLFont: Few-Shot Chinese Font Generation via Deep Meta-Learning