Skip to main content
Log in

SFOD-Trans: semi-supervised fine-grained object detection framework with transformer module

  • Original Article
  • Published:
Medical & Biological Engineering & Computing Aims and scope Submit manuscript

Abstract

As the labeling cost of object detection for medical images is very high, semi-supervised learning methods for medical images are investigated. In this paper, semi-supervised fine-grained object detection framework with transformer module (SFOD-Trans) is proposed for hepatic portal vein detection. It adopts Sparse R-CNN as the backbone. In detection model, the transformer module is introduced and contrastive loss is added to improve the performance of fine-grained object detection. In order to complete the information transfer both of labeled and unlabeled pictures, a new fusion module named normalized ROI fusion (NRF) is designed based on the characteristics of hepatic portal vein. We run a large number of experiments on a dataset of 1000 real CT scans. The results show that Average Precision (AP) and Average Recall (AR) of the proposed method reach 0.773 and 0.831 respectively with the 300 labeled and 1500 unlabeled samples.

Graphic abstract

An overview of semi-supervised fine-grained object detection framework with transformer module (SFOD-Trans). There are two parallel branches to train supervised loss and semi-supervised loss respectively

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) SSD: Single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37

  2. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 779–788

  3. Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis Machine Intelligence 39(6):1137–1149

    Article  PubMed  Google Scholar 

  4. Qiao S, Chen L-C, Yuille A (2021) Detectors: detecting objects with recursive feature pyramid and switchable atrous convolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 10213–10224

  5. Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European conference on computer vision. Springer, pp 213–229

  6. Sun P, Zhang R, Jiang Y, Kong T, Xu C, Zhan W, Tomizuka M, Li L, Yuan Z, Wang C et al (2021) Sparse R-CNN: End-to-end object detection with learnable proposals. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 14454–14463

  7. He J, Chen J-N, Liu S, Kortylewski A, Yang C, Bai Y, Wang C, Yuille A (2021) Transfg: a transformer architecture for fine-grained recognition. Preprint at arXiv: 2103.07976

  8. Chen J, Lu Y, Yu Q, Luo X, Adeli E, Wang Y, Lu L, Yuille AL, Zhou Y (2021) Transunet: transformers make strong encoders for medical image segmentation. Preprint at arXiv: 2102.04306

  9. Xie E, Wang W, Wang W, Sun P, Xu H, Liang D, Luo P (2021) Trans2seg: Transparent object segmentation with transformer

  10. Stewart R, Andriluka M, Ng AY (2016) End-to-end people detection in crowded scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 2325–2333

  11. Chapelle O, Scholkopf B, Zien A (2009) Semi-supervised learning. In: Chapelle O et al (ed) 2006 IEEE Transactions on Neural Networks, vol 20, no 3. pp 542–542

  12. Sajjadi M, Javanmardi M, Tasdizen T (2016) Regularization with stochastic transformations and perturbations for deep semi-supervised learning. Adv Neural Inf Proces Syst 29

  13. Zhang H, Cisse M, Dauphin YN, Lopez-Paz D (2017) mixup: Beyond empirical risk minimization. Preprint at arXiv:1710.09412

  14. Berthelot D, Carlini N, Goodfellow I, Papernot N, Oliver A, Raffel CA (2019) Mixmatch: A holistic approach to semi-supervised learning. Adv Neural Inf Proces Syst 32

  15. Grandvalet Y, Bengio Y (2004) Semi-supervised learning by entropy minimization. Adv Neural Inf Proces Syst 17

  16. Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 7794–7803

  17. Jie H, Li S, Gang S, Albanie S (2017) Squeeze-and-excitation networks. IEEE Transactions on Pattern Analysis and Machine Intelligence PP:99

  18. Parmar N, Vaswani A, Uszkoreit J, Kaiser L, Shazeer N, Ku A, Tran D (2018) Image transformer. In: International Conference on Machine Learning. PMLR, pp 4055–4064

  19. Lüscher C, Beck E, Irie K, Kitza M, Michel W, Zeyer A, Schlüter R, Ney H (2019) Rwth ASR systems for librispeech: Hybrid vs attention–w/o data augmentation. Preprint at arXiv: 1905.03072

  20. Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. Preprint at arXiv: 1810.04805

  21. Joachims T et al (1999) Transductive inference for text classification using support vector machines. In: ICML, vol 99. pp 200–209

  22. Gammerman A, Vovk V, Vapnik V (2013) Learning by transduction. Morgan Kaufmann Publishers Inc.

  23. Liu B, Wu Z, Hu H, Lin S (2019) Deep metric transfer for label propagation with limited annotated data. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. pp 0–0

  24. Kingma DP, Rezende DJ, Mohamed S, Welling M (2014) Semi-supervised learning with deep generative models. Adv Neural Inf Proces Syst 4:3581–3589

    Google Scholar 

  25. Pu Y, Gan Z, Henao R, Yuan X, Li C, Stevens A, Carin L (2016) Variational autoencoder for deep learning of images, labels and captions. Adv Neural Inf Proces Syst 29

  26. Laine S, Aila T (2016) Temporal ensembling for semi-supervised learning. Preprint at arXiv: 1610.02242

  27. Xie Q, Dai Z, Hovy E, Luong M-T, Le QV (2019) Unsupervised data augmentation for consistency training. Preprint at arXiv: 1904.12848

  28. Liu Y, Ning Z, Örmeci N, An W, Yu Q, Han K, Huang Y, Liu D, Liu F, Li Z et al (2020) Deep convolutional neural network-aided detection of portal hypertension in patients with cirrhosis. Clin Gastroenterol Hepatol 18(13):2998–3007

    Article  PubMed  Google Scholar 

  29. Nie D, Gao Y, Wang L, Shen D (2018) ASDNET: attention based semi-supervised deep networks for medical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 370–378

  30. Qiao S, Shen W, Zhang Z, Wang B, Yuille A (2018) Deep co-training for semi-supervised image recognition. In: Proceedings of the European Conference on Computer Vision (ECCV). pp 135–152

  31. Li X, Yu L, Chen H, Fu C-W, Xing L, Heng P-A (2020) Transformation-consistent self-ensembling model for semisupervised medical image segmentation. IEEE Transactions on Neural Networks and Learning Systems 32(2):523–534

    Article  Google Scholar 

  32. Luo X, Liao W, Chen J, Song T, Chen Y, Zhang S, Chen N, Wang G, Zhang S (2021) Efficient semi-supervised gross target volume of nasopharyngeal carcinoma segmentation via uncertainty rectified pyramid consistency. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, pp 318–329

  33. Zhou Y, He X, Huang L, Liu L, Zhu F, Cui S, Shao L (2019) Collaborative learning of semi-supervised segmentation and classification for medical images. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 2079–2088

  34. Chen S, Bortsova G, García-Uceda Juárez A, Tulder GV, Bruijne MD (2019) Multi-task attention-based semi-supervised learning for medical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, pp 457–465

  35. Ganaye P-A, Sdika M, Benoit-Cattin H (2018) Semi-supervised learning for segmentation under semantic constraint. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer. pp 595–602

  36. Huynh T, Nibali A, He Z (2022) Semi-supervised learning for medical image classification using imbalanced training data. Comput Methods Prog Biomed 106628

  37. Wang Y, Zheng K, Cheng C-T, Zhou X-Y, Zheng Z, Xiao J, Lu L, Liao C-H, Miao S (2021)Knowledge distillation with adaptive asymmetric label sharpening for semi-supervised fracture detection in chest x-rays. In: International Conference on Information Processing in Medical Imaging. Springer, pp 599–610

  38. Cai Z, Vasconcelos N (2018) Cascade R-CNN: delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 6154–6162

  39. Everingham M, Eslami S, Van Gool L, Williams CK, Winn J, Zisserman A (2015) The pascal visual object classes challenge: A retrospective. Int J Comput Vis 111(1):98–136

    Article  Google Scholar 

  40. Sohn K (2016) Improved deep metric learning with multi-class n-pair loss objective. Adv Neural Inf Proces Syst 29

  41. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 770–778

  42. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2117–2125

  43. Zhu X, Su W, Lu L, Li B, Wang X, Dai J (2020) Deformable DETR: Deformable transformers for end-to-end object detection. Preprint at arXiv: 2010.04159

  44. Tang P, Wang X, Bai S, Shen W, Bai X, Liu W, Yuille AL (2018) PCL: Proposal cluster learning for weakly supervised object detection. IEEE Trans Pattern Anal Mach Intell PP:1–1

  45. Wan F, Wei P, Jiao J, Han Z, Ye Q (2018) Min-entropy latent model for weakly supervised object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 1297–1306

  46. Sohn K, Zhang Z, Li C-L, Zhang H, Lee C-Y, Pfister T (2020) A simple semi-supervised learning framework for object detection. Preprint at arXiv: 2005.04757

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kefeng Li.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Q., Zhang, G., Li, K. et al. SFOD-Trans: semi-supervised fine-grained object detection framework with transformer module. Med Biol Eng Comput 60, 3555–3566 (2022). https://doi.org/10.1007/s11517-022-02682-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11517-022-02682-1

Keywords

Navigation