ABSTRACT
GAN-based image-to-image transfer tools have achieved remarkable results in image generation. However, most of the research efforts focus on changing the style features, e.g., color and texture. The spatial features, e.g., the locations of objects in input and output images, always keep consistent. If the above tools are employed to translate locations, such as organizing objects from a chaotic scene to an orderly scene in images (i.e., chaos to order), can these tools work well? Therefore, we investigate the issue of image-to-image location transfer and receive a preliminary conclusion that it is hard to manipulate spatial features of objects in raw images automatically. In this paper, we propose a novel framework called LT-GAN to address the above issue. Specifically, a multi-stage generation structure is designed, where the location translation is performed based on semantic labels as a bridge to enhance the effect of automatically manipulate the spatial features of raw images. Experimental results demonstrate the effectiveness of the proposed multi-stage generation strategy. Meanwhile, a Color Histogram Loss is explored to evaluate the similarity of color distribution between a chaotic scene image and the corresponding orderly scene image. The quality of orderly scene images generated by the final stage is improved significantly in LT-GAN by using the combination of feature extraction and the Color Histogram Loss. Moreover, to break through the limitation of public datasets in image-to-image transfer tasks, a new dataset named M2C is constructed for this new application scenario of location transfer, including more than 15,000 paired images and the corresponding semantic labels in total. The dataset is available at \urlhttps://drive.google.com/open?id=1amr9ga9wvhnIzeZ48OHbLapHGqOb4-Up
- Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In ECCV. 801--818.Google Scholar
- Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In CVPR. 3213--3223.Google Scholar
- Brian Dolhansky and Cristian Canton Ferrer. 2018. Eye in-painting with exemplar generative adversarial networks. In CVPR. 7902--7911.Google Scholar
- Alexei A Efros and Thomas K Leung. 1999. Texture synthesis by non-parametric sampling. In ICCV, Vol. 2. IEEE, 1033--1038.Google ScholarCross Ref
- Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In NIPS. 2672--2680.Google Scholar
- Aaron Hertzmann, Charles E Jacobs, Nuria Oliver, Brian Curless, and David H Salesin. 2001. Image analogies. In SIGGRAPH. ACM, 327--340.Google Scholar
- Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NIPS. 6626--6637.Google Scholar
- Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-toimage translation with conditional adversarial networks. In CVPR. 1125--1134.Google Scholar
- Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2017. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017).Google Scholar
- Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
- Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. 2017. Photo-realistic single image super-resolution using a generative adversarial network. In CVPR. 4681--4690.Google Scholar
- Chuan Li and Michael Wand. 2016. Precomputed real-time texture synthesis with markovian generative adversarial networks. In ECCV. Springer, 702--716.Google Scholar
- Jianan Li, Xiaodan Liang, Yunchao Wei, Tingfa Xu, Jiashi Feng, and Shuicheng Yan. 2017. Perceptual generative adversarial networks for small object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1222--1230.Google ScholarCross Ref
- Si Liu, Yao Sun, Defa Zhu, Renda Bao, Wei Wang, Xiangbo Shu, and Shuicheng Yan. 2017. Face aging with contextual generative adversarial nets. In Proceedings of the 25th ACM international conference on Multimedia. ACM, 82--90.Google ScholarDigital Library
- Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In CVPR. 3431--3440.Google Scholar
- Pauline Luc, Camille Couprie, Soumith Chintala, and Jakob Verbeek. 2016. Semantic segmentation using adversarial networks. arXiv preprint arXiv:1611.08408 (2016).Google Scholar
- Eunbyung Park, Jimei Yang, Ersin Yumer, Duygu Ceylan, and Alexander C Berg. 2017. Transformation-grounded image generation network for novel 3d view synthesis. In CVPR. 3500--3509.Google Scholar
- Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. 2016. Generative adversarial text to image synthesis. arXiv preprint arXiv:1605.05396 (2016).Google ScholarDigital Library
- Zhixin Shu, Ersin Yumer, Sunil Hadap, Kalyan Sunkavalli, Eli Shechtman, and Dimitris Samaras. 2017. Neural face editing with intrinsic image disentangling. In CVPR. 5541--5550.Google Scholar
- Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. 2012. Indoor segmentation and support inference from rgbd images. In ECCV. Springer, 746-- 760.Google Scholar
- Sergey Tulyakov, Ming-Yu Liu, Xiaodong Yang, and Jan Kautz. 2018. Mocogan: Decomposing motion and content for video generation. In CVPR. 1526--1535.Google Scholar
- Evgeniya Ustinova and Victor Lempitsky. 2016. Learning deep embeddings with histogram loss. In NIPS. 4170--4178.Google Scholar
- Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. High-resolution image synthesis and semantic manipulation with conditional gans. In CVPR. 8798--8807.Google Scholar
- Xiaolong Wang, Abhinav Shrivastava, and Abhinav Gupta. 2017. A-fast-rcnn: Hard positive generation via adversary for object detection. In CVPR. 2606--2615.Google Scholar
- Jiajun Wu, Chengkai Zhang, Tianfan Xue, Bill Freeman, and Josh Tenenbaum. 2016. Learning a probabilistic latent space of object shapes via 3d generative adversarial modeling. In NIPS. 82--90.Google Scholar
- Qizhe Xie, Zihang Dai, Yulun Du, Eduard Hovy, and Graham Neubig. 2017. Controllable in variance through adversarial feature learning. In NIPS. 585--596.Google Scholar
- Li-Chia Yang, Szu-Yu Chou, and Yi-Hsuan Yang. 2017. MidiNet: A convolutional generative adversarial network for symbolic-domain music generation. arXiv preprint arXiv:1703.10847 (2017).Google Scholar
- Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, and Dimitris N Metaxas. 2017. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In ICCV. 5907--5915.Google Scholar
- Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. 2017. Scene parsing through ade20k dataset. In CVPR. 633--641.Google Scholar
- Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In ICCV. 2223--2232.Google Scholar
Recommendations
Global Relation-Aware Attention Network for Image-Text Retrieval
ICMR '21: Proceedings of the 2021 International Conference on Multimedia RetrievalThe cross-modal image-text retrieval has attracted extensive attention in recent years, which contributes to the development of search engine. Fine-grained features and cross-attention have been widely used in past researches to reach the goal of cross-...
Generative Adversarial Networks with Bi-directional Normalization for Semantic Image Synthesis
ICMR '21: Proceedings of the 2021 International Conference on Multimedia RetrievalSemantic image synthesis aims at translating semantic label maps to photo-realistic images. However, most of previous methods easily generate blurred regions and artifacts, and the quality of these images is far from realistic. There are two unresolved ...
MLFont: Few-Shot Chinese Font Generation via Deep Meta-Learning
ICMR '21: Proceedings of the 2021 International Conference on Multimedia RetrievalThe automatic generation of Chinese fonts is challenging due to the large quantity and complex structure of Chinese characters. When there are insufficient reference samples for the target font, existing deep learning-based methods cannot avoid ...
Comments