CycleVTON: A Cycle Mapping Framework for Parser-Free Virtual Try-On

Authors

  • Chenghu Du Wuhan University of Technology
  • Junyin Wang Wuhan University of Technology
  • Yi Rong Wuhan University of Technology Sanya Science and Education Innovation Park, Wuhan University of Technology
  • Shuqing Liu Wuhan Textile University
  • Kai Liu Wuhan University of Technology
  • Shengwu Xiong Wuhan University of Technology Shanghai AI Laboratory Sanya Science and Education Innovation Park, Wuhan University of Technology Qiongtai Normal University

DOI:

https://doi.org/10.1609/aaai.v38i2.27928

Keywords:

CV: Applications, APP: Other Applications, CV: Biometrics, Face, Gesture & Pose, CV: Computational Photography, Image & Video Synthesis, CV: Representation Learning for Vision, HAI: Applications, HAI: Human-in-the-loop Machine Learning, HAI: User Experience and Usability, ML: Applications, ML: Deep Generative Models & Autoencoders, ML: Deep Learning Algorithms

Abstract

Image-based virtual try-on aims to transfer a target clothing onto a specific person. A significant challenge is arbitrarily matched clothing and person lack corresponding ground truth to supervised learning. A recent pioneering work leveraged an improved cycleGAN to enable one network to generate the desired image for another network during training. However, there is no difference in the result distribution before and after the clothing changes. Therefore, using two different networks is unnecessary and may even increase the difficulty of convergence. Furthermore, the introduced human parsing used to provide body structure information in the input also have a negative impact on the try-on result. How to employ a single network for supervised learning while eliminating human parsing? To tackle these issues, we present a Cycle mapping Virtual Try-On Network (CycleVTON), which can produce photo-realistic try-on results by using a cycle mapping framework without the parser. In particular, we introduce a flow constraint loss to achieve supervised learning of arbitrarily matched clothing and person as inputs to the deformer, thus naturally mimicking the interaction between clothing and the human body. Additionally, we design a skin generation strategy that can adapt to the shape of the target clothing by dynamically adjusting the skin region, i.e., by first removing and then filling skin areas. Extensive experiments conducted on challenging benchmarks demonstrate that our proposed method exhibits superior performance compared to state-of-the-art methods.

Published

2024-03-24

How to Cite

Du, C., Wang, J., Rong, Y., Liu, S., Liu, K., & Xiong, S. (2024). CycleVTON: A Cycle Mapping Framework for Parser-Free Virtual Try-On. Proceedings of the AAAI Conference on Artificial Intelligence, 38(2), 1618-1625. https://doi.org/10.1609/aaai.v38i2.27928

Issue

Section

AAAI Technical Track on Computer Vision I