Real-time instance segmentation with assembly parallel task

Yang, Zhen; Wang, Yang; Yang, Fan; Yin, Zhijian; Zhang, Tao

doi:10.1007/s00371-022-02537-8

Real-time instance segmentation with assembly parallel task

Original article
Published: 16 June 2022

Volume 39, pages 3937–3947, (2023)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Zhen Yang^1,2,
Yang Wang¹,
Fan Yang¹,
Zhijian Yin¹ &
…
Tao Zhang ORCID: orcid.org/0000-0002-7192-5153^3,4

466 Accesses
2 Citations
Explore all metrics

Abstract

Although instance segmentation has made significant progress in recent years, it is still a challenge to develop highly accurate algorithms with real-time performance. In this paper, we propose a real-time framework denoted by APTMask for instance segmentation, which builds on the real-time project YOLACT. In APTMask, we use Swin-Transformer Tiny with PA-FPN as the default feature backbone and a base image size of \( 544\times 544 \). We devise a new mask branch, which can more effectively exploit the semantic information of PA-FPN deeper features and the positional information of shallow features for mask representation, compared to the use of implicit parameterized forms. We replace fast NMS with Cluster NMS, which compensates for the performance penalty of fast NMS compiled to standard NMS. CIoU loss is also adopted to fully exploit the scale information of the aspect ratio of the bounding box. Experimental results show that APTMask can achieve 39.7/34.7 box/mask AP on COCO val2017 dataset at 31.8 fps evaluated with a single RTX 2080TI GPU card. Compared to YOLACT, APTMask improves the box AP by about 8.0% and the mask AP by 6.2%, which is encouraging and competitive. Given its simplicity and efficiency, we hope that our APTMask can serve as a simple but strong baseline for a variety of instance-wise prediction tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Proposal-Free Volumetric Instance Segmentation from Latent Single-Instance Masks

SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation

Searching sharing relationship for instance segmentation decoder

Article 02 May 2023

References

Wang, G., Zhang, B., Wang, H., Xu, L., Li, Y., Liu, Z.: Detection of the drivable area on high-speed road via yolact. Signal Image Video Process. 2, 1–8 (2022)
Google Scholar
Chiao, J.-Y., Chen, K.-Y., Liao, K.Y.-K., Hsieh, P.-H., Zhang, G., Huang, T.-C.: Detection and classification the breast tumors using mask r-cnn on sonograms. Medicine 98(19), 11045 (2019)
Article Google Scholar
Cai, L., Long, T., Dai, Y., Huang, Y.: Mask r-cnn-based detection and segmentation for pulmonary nodule 3D visualization diagnosis. IEEE Access 8, 44400–44409 (2020)
Article Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Gao, N., Shan, Y., Wang, Y., Zhao, X., Yu, Y., Yang, M., Huang, K.: Ssap: Single-shot instance segmentation with affinity pyramid. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 642–651 (2019)
Newell, A., Huang, Z., Deng, J.: Associative embedding: end-to-end learning for joint detection and grouping. Adv. Neural Inf. Process Syst. 30, 1104 (2017)
Google Scholar
Bolya, D., Zhou, C., Xiao, F., Lee, Y.J.: Yolact: Real-time instance segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9157–9166 (2019)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly S., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers and distillation through attention. In: International Conference on Machine Learning, pp. 10347–10357, PMLR (2021)
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030 (2021)
Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8759–8768 (2018)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Mozaffari, M.H., Lee, W.-S.: Semantic segmentation with peripheral vision. In: International Symposium on Visual Computing, pp. 421–429, Springer (2020)
Li, Y., Qi, H., Dai, J., Ji, X., Wei, Y.: Fully convolutional instance-aware semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2359–2367 (2017)
Dai, J., He, K., Sun, J.: Instance-aware semantic segmentation via multi-task network cascades. arXiv e-prints, arXiv–1512 (2015)
Dai, J., He, K., Li, Y., Ren, S., Sun, J.: Instance-sensitive fully convolutional networks. In: European Conference on Computer Vision, pp. 534–549, Springer (2016)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural. Inf. Process. Syst. 28, 91–99 (2015)
Google Scholar
Huang, Z., Huang, L., Gong, Y., Huang, C., Wang, X.: Mask scoring r-cnn. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6409–6418 (2019)
Zhang, G., Lu, X., Tan, J., Li, J., Zhang, Z., Li, Q., Hu, X.: Refinemask: Towards high-quality instance segmentation with fine-grained features. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6861–6869 (2021)
Chen, H., Sun, K., Tian, Z., Shen, C., Huang, Y., Yan, Y.: Blendmask: Top-down meets bottom-up for instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8573–8581 (2020)
Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation,. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, pp. 282–298, Springer (2020)
Wang, X., Kong, T., Shen, C., Jiang, Y., Li, L.: Solo: Segmenting objects by locations. In: European Conference on Computer Vision, pp. 649–665, Springer (2020)
Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: dynamic and fast instance segmentation. arXiv preprint arXiv:2003.10152 (2020)
Zheng, Z., Wang, P., Ren, D., Liu, W., Ye, R., Hu, Q., Zuo, W.: Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE Trans. Cybern. 2, 1140 (2021)
Google Scholar
Xie, E., Sun, P., Song, X., Wang, W., Liu, X., Liang, D., Shen, C., Luo, P.: Polarmask: Single shot instance segmentation with polar representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12193–12202 (2020)
Du, W., Xiang, Z., Chen, S., Qiao, C., Chen, Y., Bai, T.: Real-time instance segmentation with discriminative orientation maps. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7314–7323 (2021)
Liu, T., Cai, Y., Zheng, J., Thalmann, N.M.: Beacon: a boundary embedded attentional convolution network for point cloud instance segmentation. Vis. Comput. 2, 1–11 (2021)
Google Scholar
Li, X., Wu, G., Zhou, S., Lin, X., Li, X.L.: Active instance segmentation with fractional-order network and reinforcement learning. Vis. Comput. 5, 1–14 (2021)
Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37, Springer (2016)
Zhao, Q., Sheng, T., Wang, Y., Tang, Z., Chen, Y., Cai, L., Ling, H.: M2det: a single-shot object detector based on multi-level feature pyramid network. Proc. AAAI Conf. Artif. Intell. 33(01), 9259–9266 (2019)
Google Scholar
Hariharan, B., Arbeláez, P., Bourdev, L., Maji, S., Malik, J.: Semantic contours from inverse detectors. In: 2011 International Conference on Computer Vision, pp. 991–998, IEEE (2011)
Kisantal, M., Wojna, W., Murawski, J., Naruniec, J., Cho, K.: Augmentation for small object detection.’ arXiv preprint arXiv:1902.07296 (2019)

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (62061019, 61866016), Jiangxi Provincial Natural Science Foundation (20202BABL202014, 20212BAB202013), the Key Project of Jiangxi Education Department (GJJ201107, GJJ190587), and the Key Laboratory of System Control and Information Processing, Ministry of Education (Scip202106).

Author information

Authors and Affiliations

School of Communication and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
Zhen Yang, Yang Wang, Fan Yang & Zhijian Yin
Guangdong Atv Academy for Performing Arts, Dongguan, China
Zhen Yang
Department of Automation, Shanghai Jiao Tong University, 800 Dongchuan Rd., Shanghai, China
Tao Zhang
Shanghai Key Laboratory of Intelligent Sensing and Recognition, Shanghai Jiao Tong University, 800 Dongchuan Rd., Shanghai, 200240, China
Tao Zhang

Authors

Zhen Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Fan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Zhijian Yin
View author publications
You can also search for this author in PubMed Google Scholar
Tao Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tao Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, Z., Wang, Y., Yang, F. et al. Real-time instance segmentation with assembly parallel task. Vis Comput 39, 3937–3947 (2023). https://doi.org/10.1007/s00371-022-02537-8

Download citation

Accepted: 16 May 2022
Published: 16 June 2022
Issue Date: September 2023
DOI: https://doi.org/10.1007/s00371-022-02537-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Real-time instance segmentation with assembly parallel task

Abstract

Access this article

Similar content being viewed by others

Proposal-Free Volumetric Instance Segmentation from Latent Single-Instance Masks

SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation

Searching sharing relationship for instance segmentation decoder

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Real-time instance segmentation with assembly parallel task

Abstract

Access this article

Similar content being viewed by others

Proposal-Free Volumetric Instance Segmentation from Latent Single-Instance Masks

SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation

Searching sharing relationship for instance segmentation decoder

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation