Skip to main content

Advertisement

Log in

A semantic segmentation algorithm for fashion images based on modified mask RCNN

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The semantic segmentation of human body images has huge application potential in many fields, such as autonomous driving, artificial intelligence (AI) face changing, and virtual try-on. Nowadays, many researchers use additional human body posture information to generate multi-level human body analysis images. However, the existing method has limitations when faced with multiple poses and overlapping targets. In this paper, a novel algorithm based on Mask RCNN which has pixel-level accuracy is proposed. In the feature extraction process, a multi-scale feature fusion module applying dilated convolution is proposed to obtain richer semantic information from different perceptual fields. We added a small residual module to the original residual unit structure to increase the size of the receptive field of each layer to capture details and global characteristics. Three convolution kernels with different ratios are designed to obtain receptive fields of different scales. The experimental results show that our method has better performance while considering both object positioning and target classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Arsalan M, Kim DS, Lee MB, Owais M, Park KR (2019) FRED-Net: fully residual encoder–decoder network for accurate iris segmentation. Expert Syst Appl 122:217–241. https://doi.org/10.1016/j.eswa.2019.01.010

    Article  Google Scholar 

  2. Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495. https://doi.org/10.1109/TPAMI.2016.2644615

    Article  Google Scholar 

  3. Chen Y, Hu H (2020) Multi-layer adaptive feature fusion for semantic segmentation. Neural Process Lett 51(2):1081–1092. https://doi.org/10.1007/s11063-019-10129-2

    Article  Google Scholar 

  4. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2014) Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv Prepr. arXiv1412.7062. https://doi.org/10.48550/arXiv.1412.7062

  5. Gao S, Cheng M-M, Zhao K, Zhang X-Y, Yang M-H, Torr PHS (2019) Res2net: a new multi-scale backbone architecture. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2019.2938758

    Article  Google Scholar 

  6. Garcia-Garcia A, Orts-Escolano S, Oprea S, Villena-Martinez V, Martinez-Gonzalez P, Garcia-Rodriguez J (2018) A survey on deep learning techniques for image and video semantic segmentation. Appl Soft Comput 70:41–65. https://doi.org/10.1016/j.asoc.2018.05.018

    Article  Google Scholar 

  7. Gong K, Liang X, Zhang D, Shen X, Lin L (2017) Look into person: self-supervised structure-sensitive learning and a new benchmark for human parsing. Proc. – 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol 2017-Janua, pp 6757–6765. https://doi.org/10.1109/CVPR.2017.715

  8. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  9. He K, Gkioxari G, Dollár P, Girshick R (2020) Mask R-CNN. IEEE Trans Pattern Anal Mach Intell 42(2):386–397. https://doi.org/10.1109/TPAMI.2018.2844175

    Article  Google Scholar 

  10. Kwak J, Sung Y (2021) DeepLabV3-Refiner-based semantic segmentation model for dense 3D point clouds. Remote Sens 13(8):1565. https://doi.org/10.3390/rs13081565

    Article  Google Scholar 

  11. Li S, Zhao X, Zhou G (2019) Automatic pixel-level multiple damage detection of concrete structure using fully convolutional network. Comput Civ Infrastruct Eng 34(7):616–634. https://doi.org/10.1111/mice.12433

    Article  Google Scholar 

  12. Liu S et al (2013) Fashion parsing with weak color-category labels. IEEE Trans Multimed 16(1):253–265. https://doi.org/10.1109/TMM.2013.2285526

    Article  Google Scholar 

  13. Liu Z, Luo P, Qiu S, Wang X, Tang X (2016) Deepfashion: powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1096–1104

  14. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440

  15. Mehmood S, Shahzad M, Fraz MM (2020) Deep context aware recurrent neural network for semantic segmentation of large scale unstructured 3D point cloud. Neural Process Lett. https://doi.org/10.1007/s11063-020-10368-8

    Article  Google Scholar 

  16. Paszke A, Chaurasia A, Kim S, Culurciello E (2016) Enet: a deep neural network architecture for real-time semantic segmentation. arXiv Prepr. arXiv1606.02147. https://doi.org/10.48550/arXiv.1606.02147

  17. Pavoni G, Corsini M, Pedersen N, Petrovic V, Cignoni P (2021) Challenges in the deep learning-based semantic segmentation of benthic communities from Ortho-images. Appl Geomat 13(1):131–146. https://doi.org/10.1007/s12518-020-00331-6

    Article  Google Scholar 

  18. Razzaghi P, Samavi S (2015) Image retargeting using nonparametric semantic segmentation. Multimed Tools Appl 74(24):11517–11536. https://doi.org/10.1007/s11042-014-2249-y

    Article  Google Scholar 

  19. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28:91–99

    Google Scholar 

  20. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv Prepr. arXiv1409.1556. https://doi.org/10.48550/arXiv.1409.1556

  21. Xia F, Wang P, Chen X, Yuille A (2017) Joint multi-person pose estimation and semantic part segmentation. Proc. – 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol 2017-Janua, pp 6080–6089. https://doi.org/10.1109/CVPR.2017.644

  22. Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. arXiv Prepr. arXiv1511.07122. https://doi.org/10.48550/arXiv.1511.07122

  23. Zhang Q, Yang M, Kpalma K, Zheng Q, Zhang X (2018) Segmentation of hand posture against complex backgrounds based on saliency and skin colour detection. IAENG Int J Comput Sci 45(3):435–444

    Google Scholar 

  24. Zhang X, Yang Y, Li Z, Ning X, Qin Y, Cai W (2021) An improved encoder-decoder network based on strip pool method applied to segmentation of farmland vacancy field. Entropy 23(4):435. https://doi.org/10.3390/e23040435

    Article  Google Scholar 

  25. Zhu B, Chen Y, Tang M, Wang J (2018) Progressive cognitive human parsing. 32nd AAAI Conf. Artif. Intell. AAAI 2018, pp 7607–7614. https://doi.org/10.1609/aaai.v32i1.12336

Download references

Funding

This work was supported by National Natural Science Foundation of China (No. 61976105) and Postgraduate Research & Practice Innovation Program of Jiangsu Province (No. KYCX22_2342).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruru Pan.

Ethics declarations

Data sharing

Data sharing is not applicable to this article, as no new data were created or analyzed in this study.

Conflict of interest

Wentao He, Jing’an Wang, Lei Wang, Ruru Pan* and Weidong Gao declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

He, W., Wang, J., Wang, L. et al. A semantic segmentation algorithm for fashion images based on modified mask RCNN. Multimed Tools Appl 82, 28427–28444 (2023). https://doi.org/10.1007/s11042-023-14958-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-14958-1

Keywords

Navigation