Abstract
Clothing parsing tasks have attracted considerable attention because of their wide application. The challenge of clothing parsing is that clothing images have many characteristics, such as complex textures, different styles, and changeful human postures, so the task of clothing analysis needs to consider rich semantic features and accurate spatial information. However, due to repeated down-sampling operations, the current semantic segmentation networks are easy to lose a lot of spatial information. We propose a feature fusion network, which consists of multistage fusion network and edge perceiving network, can better capture the details. Experimental results show that our proposed method achieves state-of-art performance on the fashion clothing and the LIP datasets. Especially, our model achieves an average f1-score of 61.54\(\%\) on the fashion clothing test set and a mean IoU score of 53.58\(\%\) on the LIP validation set.
Similar content being viewed by others
References
Liu S, Song Z, Liu G, Xu C, Lu H, Yan S (2012) Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 3330–3337
Wang X, Zhang T (2011) Clothes search in consumer photos via color matching and attribute learning. In: Proceedings of the 19th ACM international conference on multimedia, pp 1353–1356
Liu S, Feng J, Song Z, Zhang T, Lu H, Xu C, Yan S (2012) Hi, magic closet, tell me what to wear! In: Proceedings of the 20th ACM international conference on multimedia, pp 619–628
Kalantidis Y, Kennedy L, Li L-J (2013) Getting the look: clothing recognition and segmentation for automatic product suggestions in everyday photos. In: Proceedings of the 3rd ACM conference on international conference on multimedia retrieval, pp 105–112
Xing L, Zhang J, Liang H, Li Z (2018) Intelligent recognition of dominant colors for Chinese traditional costumes based on a mean shift clustering method. J Text Inst 109(10):1304–1314
Xiao Z, Liu X, Wu J, Geng L, Sun Y, Zhang F, Tong J (2018) Knitted fabric structure recognition based on deep learning. J Text Inst 109(9):1217–1223
Wang H, Duan F, Zhou W (2021) Fabric defect detection under complex illumination based on an improved recurrent attention model. J Text Inst 112(8):1273–1279
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Lin G, Milan A, Shen C, Reid I (2017) Refinenet: multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1925–1934
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 801–818
Islam MA, Naha S, Rochan M, Bruce N, Wang Y (2017) Label refinement network for coarse-to-fine semantic segmentation. arXiv preprint arXiv:1703.00551
Hu T, Yang M, Yang W, Li A (2019) An end-to-end differential network learning method for semantic segmentation. Int J Mach Learn Cybern 10(7):1909–1924
Aslan S, Ciocca G, Mazzini D, Schettini R (2020) Benchmarking algorithms for food localization and semantic segmentation. Int J Mach Learn Cybern 11(12):2827–2847
Luo X, Su Z, Guo J, Zhang G, He X (2018) Trusted guidance pyramid network for human parsing. In: Proceedings of the 26th ACM international conference on multimedia, pp 654–662
Gong K, Liang X, Zhang D, Shen X, Lin L (2017) Look into person: self-supervised structure-sensitive learning and a new benchmark for human parsing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 932–940
Zhang S, Song Z, Cao X, Zhang H, Zhou J (2019) Task-aware attention model for clothing attribute prediction. IEEE Trans Circuits Syst Video Technol 30(4):1051–1064
Liu S, Feng J, Domokos C, Xu H, Huang J, Hu Z, Yan S (2013) Fashion parsing with weak color-category labels. IEEE Trans Multimed 16(1):253–265
Zhu S, Urtasun R, Fidler S, Lin D, Change Loy C (2017) Be your own prada: fashion synthesis with structural coherence. In: Proceedings of the IEEE international conference on computer vision, pp 1680–1688
Yamaguchi K, Kiapour MH, Ortiz LE, Berg TL (2012) Parsing clothing in fashion photographs. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 3570–3577
Simo-Serra E, Fidler S, Moreno-Noguer F, Urtasun R (2014) A high performance crf model for clothes parsing. In: Asian conference on computer vision. Springer, pp 64–81
Khurana T, Mahajan K, Arora C, Rai A (2018) Exploiting texture cues for clothing parsing in fashion images. In: 2018 25th IEEE international conference on image processing (ICIP). IEEE, pp 2102–2106
Liang X, Lin L, Yang W, Luo P, Huang J, Yan S (2016) Clothes co-parsing via joint image segmentation and labeling with application to clothing retrieval. IEEE Trans Multimed 18(6):1175–1186
Tangseng P, Wu Z, Yamaguchi K (2017) Looking at outfit to parse clothing. arXiv preprint arXiv:1703.01386
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 234–241
Oktay O, Schlemper J, Folgoc LL, Lee M, Heinrich M, Misawa K, Mori K, McDonagh S, Hammerla NY, Kainz B et al (2018) Attention u-net: learning where to look for the pancreas. arXiv preprint arXiv:1804.03999
Li H, Xiong P, An J, Wang L (2018) Pyramid attention network for semantic segmentation. arXiv preprint arXiv:1805.10180
Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) Bisenet: Bilateral segmentation network for real-time semantic segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 325–341
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Yamaguchi K, Hadi Kiapour M, Berg TL (2013) Paper doll parsing: retrieving similar styles to parse clothing items. In: Proceedings of the IEEE international conference on computer vision, pp 3519–3526
Chen L-C, Yang Y, Wang J, Xu W, Yuille AL (2016) Attention to scale: scale-aware semantic image segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3640–3649
Wang W, Zhang Z, Qi S, Shen J, Pang Y, Shao L (2019) Learning compositional neural information fusion for human parsing. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 5703–5713
Wang W, Zhu H, Dai J, Pang Y, Shen J, Shao L (2020) Hierarchical human parsing with typed part-relation reasoning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8929–8939
Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin ZM, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in PyTorch. In: 31st conference on neural information processing systems, Long Beach, pp 1–4
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Luc P, Couprie C, Chintala S, Verbeek J (2016) Semantic segmentation using adversarial networks. arXiv preprint arXiv:1611.08408
Luo Y, Zheng Z, Zheng L, Guan T, Yu J, Yang Y (2018) Macro-micro adversarial network for human parsing. In: Proceedings of the European conference on computer vision (ECCV), pp 418–434
Liang X, Gong K, Shen X, Lin L (2018) Look into person: joint body parsing and pose estimation network and a new benchmark. IEEE Trans Pattern Anal Mach Intell 41(4):871–885
Ruan T, Liu T, Huang Z, Wei Y, Wei S, Zhao Y (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 4814–4821
Yang W, Luo P, Lin L (2014) Clothing co-parsing by joint image segmentation and labeling. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3182–3189
Cheng B, Xiao B, Wang J, Shi H, Huang TS, Zhang L (2020) Higherhrnet: Scale-aware representation learning for bottom-up human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5386–5395
Alzubi OA, Alzubi JA, Alweshah M, Qiqieh I, Al-Shami S, Ramachandran M (2020) An optimal pruning algorithm of classifier ensembles: dynamic programming approach. Neural Comput Appl 32(20):16091–16107
Alzubi OA, Alzubi JAA, Tedmori S, Rashaideh H, Almomani O (2018) Consensus-based combining method for classifier ensembles. Int Arab J Inf Technol 15(1):76–86
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chen, L., Yu, E. & Cong, H. Feature fusion network for clothing parsing. Int. J. Mach. Learn. & Cyber. 13, 2229–2238 (2022). https://doi.org/10.1007/s13042-022-01519-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-022-01519-5