Skip to main content

OSFormer: One-Stage Camouflaged Instance Segmentation with Transformers

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Abstract

We present OSFormer, the first one-stage transformer framework for camouflaged instance segmentation (CIS). OSFormer is based on two key designs. First, we design a location-sensing transformer (LST) to obtain the location label and instance-aware parameters by introducing the location-guided queries and the blend-convolution feed-forward network. Second, we develop a coarse-to-fine fusion (CFF) to merge diverse context information from the LST encoder and CNN backbone. Coupling these two components enables OSFormer to efficiently blend local features and long-range context dependencies for predicting camouflaged instances. Compared with two-stage frameworks, our OSFormer reaches 41% AP and achieves good convergence efficiency without requiring enormous training data, i.e., only 3,040 samples under 60 epochs. Code link: https://github.com/PJLallen/OSFormer.

J. Pei and T. Cheng—Equal contributions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We split and restore the \(X_{e}\) to the 2D representations \(T3\in \mathbb {R}^{{\frac{H}{8}}\times {\frac{W}{8}}\times D}\), \(T4\in \mathbb {R}^{{\frac{H}{16}}\times {\frac{W}{16}}\times D}\), and \(T5\in \mathbb {R}^{{\frac{H}{32}}\times {\frac{W}{32}}\times D}\).

References

  1. Bai, M., Urtasun, R.: Deep watershed transform for instance segmentation. In: IEEE CVPR (2017)

    Google Scholar 

  2. Bhajantri, N.U., Nagabhushan, P.: Camouflage defect identification: a novel approach. In: IEEE ICIT (2006)

    Google Scholar 

  3. Bolya, D., Zhou, C., Xiao, F., Lee, Y.J.: Yolact: Real-time instance segmentation. In: IEEE CVPR (2019)

    Google Scholar 

  4. Cai, Z., Vasconcelos, N.: Cascade r-cnn: high quality object detection and instance segmentation. IEEE TPAMI 43(5), 1483–1498 (2019)

    Article  Google Scholar 

  5. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13

    Chapter  Google Scholar 

  6. Chen, H., Sun, K., Tian, Z., Shen, C., Huang, Y., Yan, Y.: Blendmask: Top-down meets bottom-up for instance segmentation. In: IEEE CVPR (2020)

    Google Scholar 

  7. Chen, K., et al.: Hybrid task cascade for instance segmentation. In: IEEE CVPR (2019)

    Google Scholar 

  8. Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE TPAMI 40(4), 834–848 (2017)

    Article  Google Scholar 

  9. Chen, S., Tan, X., Wang, B., Hu, X.: Reverse attention for salient object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 236–252. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_15

    Chapter  Google Scholar 

  10. Chu, H.K., Hsu, W.H., Mitra, N.J., Cohen-Or, D., Wong, T.T., Lee, T.Y.: Camouflage images. ACM TOG 29(4), 51–61 (2010)

    Google Scholar 

  11. Cuthill, I.: Camouflage. JOZ 308(2), 75–92 (2019)

    Google Scholar 

  12. Dai, X., Chen, Y., Yang, J., Zhang, P., Yuan, L., Zhang, L.: Dynamic detr: End-to-end object detection with dynamic attention. In: IEEE CVPR (2021)

    Google Scholar 

  13. Dai, Z., Cai, B., Lin, Y., Chen, J.: Up-detr: Unsupervised pre-training for object detection with transformers. In: IEEE CVPR (2021)

    Google Scholar 

  14. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: IEEE CVPR (2009)

    Google Scholar 

  15. Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: ICLR (2021)

    Google Scholar 

  16. Fan, D.P., Ji, G.P., Sun, G., Cheng, M.M., Shen, J., Shao, L.: Camouflaged object detection. In: IEEE CVPR (2020)

    Google Scholar 

  17. Fan, D.-P., et al.: PraNet: Parallel reverse attention network for polyp segmentation. In: Martel, A.L., Abolmaesumi, P., Stoyanov, D., Mateus, D., Zuluaga, M.A., Zhou, S.K., Racoceanu, D., Joskowicz, L. (eds.) MICCAI 2020. LNCS, vol. 12266, pp. 263–273. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59725-2_26

    Chapter  Google Scholar 

  18. Fan, D.P., et al.: Inf-net: Automatic covid-19 lung infection segmentation from ct images. IEEE TMI 39(8), 2626–2637 (2020)

    Google Scholar 

  19. Fang, Y., et al.: Instances as queries. In: IEEE CVPR (2021)

    Google Scholar 

  20. Fennell, J.G., Talas, L., Baddeley, R.J., Cuthill, I.C., Scott-Samuel, N.E.: The camouflage machine: Optimizing protective coloration using deep learning with genetic algorithms. Evolution 75(3), 614–624 (2021)

    Article  Google Scholar 

  21. Gao, N., et al.: Ssap: Single-shot instance segmentation with affinity pyramid. In: IEEE CVPR (2019)

    Google Scholar 

  22. Guo, R., Niu, D., Qu, L., Li, Z.: Sotr: Segmenting objects with transformers. In: IEEE ICCV (2021)

    Google Scholar 

  23. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: IEEE ICCV (2017)

    Google Scholar 

  24. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE CVPR (2016)

    Google Scholar 

  25. Huang, X., Belongie, S.: Arbitrary style transfer in real-time with adaptive instance normalization. In: IEEE ICCV (2017)

    Google Scholar 

  26. Huang, Z., Huang, L., Gong, Y., Huang, C., Wang, X.: Mask scoring r-cnn. In: IEEE CVPR (2019)

    Google Scholar 

  27. Huerta, I., Rowe, D., Mozerov, M., Gonzàlez, J.: Improving background subtraction based on a casuistry of colour-motion segmentation problems. In: Iberian PRIA (2007)

    Google Scholar 

  28. Ji, G.-P., et al.: Progressively normalized self-attention network for video polyp segmentation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12901, pp. 142–152. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87193-2_14

    Chapter  Google Scholar 

  29. Ke, L., Danelljan, M., Li, X., Tai, Y.W., Tang, C.K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: IEEE CVPR (2022)

    Google Scholar 

  30. Le, T.N., et al.: Camouflaged instance segmentation in-the-wild: Dataset, method, and benchmark suite. IEEE TIP 31, 287–300 (2022)

    Google Scholar 

  31. Le, T.N., Nguyen, T.V., Nie, Z., Tran, M.T., Sugimoto, A.: Anabranch network for camouflaged object segmentation. CVIU 184, 45–56 (2019)

    Google Scholar 

  32. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE ICCV (2017)

    Google Scholar 

  33. Lin, T.-Y., et al.: Microsoft COCO: Common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  34. Liu, S., Jia, J., Fidler, S., Urtasun, R.: Sgn: Sequential grouping networks for instance segmentation. In: IEEE ICCV (2017)

    Google Scholar 

  35. Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: IEEE CVPR (2018)

    Google Scholar 

  36. Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: IEEE CVPR (2021)

    Google Scholar 

  37. Lyu, Y., et al.: Simultaneously localize, segment and rank the camouflaged objects. In: IEEE CVPR (2021)

    Google Scholar 

  38. Matthews, O., Liggins, E., Volonakis, T., Scott-Samuel, N., Baddeley, R., Cuthill, I.: Human visual search performance for camouflaged targets. J. Vis. 15(12), 1164–1164 (2015)

    Article  Google Scholar 

  39. Mei, H., Ji, G.P., Wei, Z., Yang, X., Wei, X., Fan, D.P.: Camouflaged object segmentation with distraction mining. In: IEEE CVPR (2021)

    Google Scholar 

  40. Milletari, F., Navab, N., Ahmadi, S.A.: V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: IEEE 3DV (2016)

    Google Scholar 

  41. Mondal, A.: Camouflaged object detection and tracking: A survey. IJIG 20(04), 2050028 (2020)

    Google Scholar 

  42. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE CVPR (2016)

    Google Scholar 

  43. Ren, J., et al.: Deep texture-aware features for camouflaged object detection. In: IEEE TCSVT (2021)

    Google Scholar 

  44. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. In: NeurIPS (2015)

    Google Scholar 

  45. Sandon, P.A.: Simulating visual attention. J. Cogn. Neurosci. 2(3), 213–231 (1990)

    Article  MathSciNet  Google Scholar 

  46. Sofiiuk, K., Barinova, O., Konushin, A.: Adaptis: Adaptive instance selection network. In: IEEE CVPR (2019)

    Google Scholar 

  47. Song, L., Geng, W.: A new camouflage texture evaluation method based on wssim and nature image features. In: ICMT (2010)

    Google Scholar 

  48. Stevens, M., Merilaita, S.: Animal camouflage: current issues and new perspectives. PTRS B: BS 364(1516), 423–427 (2009)

    Google Scholar 

  49. Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 282–298. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_17

    Chapter  Google Scholar 

  50. Tian, Z., Shen, C., Chen, H., He, T.: Fcos: Fully convolutional one-stage object detection. In: IEEE ICCV (2019)

    Google Scholar 

  51. Troscianko, J., Nokelainen, O., Skelhorn, J., Stevens, M.: Variable crab camouflage patterns defeat search image formation. Commun. Biol. 4(1), 1–9 (2021)

    Article  Google Scholar 

  52. Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)

    Google Scholar 

  53. Wang, H., Zhu, Y., Adam, H., Yuille, A., Chen, L.C.: Max-deeplab: End-to-end panoptic segmentation with mask transformers. In: IEEE CVPR (2021)

    Google Scholar 

  54. Wang, W., et al.: Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In: IEEE CVPR (2021)

    Google Scholar 

  55. Wang, W., et al.: Pvtv 2: Improved baselines with pyramid vision transformer. In: CVMJ (2022)

    Google Scholar 

  56. Wang, X., Kong, T., Shen, C., Jiang, Y., Li, L.: SOLO: Segmenting objects by locations. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12363, pp. 649–665. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58523-5_38

    Chapter  Google Scholar 

  57. Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. In: NeurIPS (2020)

    Google Scholar 

  58. Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: Convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1

    Chapter  Google Scholar 

  59. Wu, H., et al.: Cvt: Introducing convolutions to vision transformers. In: IEEE CVPR (2021)

    Google Scholar 

  60. Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: Segformer: Simple and efficient design for semantic segmentation with transformers. In: NeurIPS (2021)

    Google Scholar 

  61. Yan, J., Le, T.N., Nguyen, K.D., Tran, M.T., Do, T.T., Nguyen, T.V.: Mirrornet: Bio-inspired camouflaged object segmentation. IEEE Access 9, 43290–43300 (2021)

    Article  Google Scholar 

  62. Yang, F., et al.: Uncertainty-guided transformer reasoning for camouflaged object detection. In: IEEE CVPR (2021)

    Google Scholar 

  63. Zhai, Q., Li, X., Yang, F., Chen, C., Cheng, H., Fan, D.P.: Mutual graph learning for camouflaged object detection. In: IEEE CVPR (2021)

    Google Scholar 

  64. Zhu, J., Zhang, X., Zhang, S., Liu, J.: Inferring camouflaged objects by texture-aware interactive guidance network. In: AAAI (2021)

    Google Scholar 

  65. Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. In: ICLR (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Deng-Ping Fan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pei, J., Cheng, T., Fan, DP., Tang, H., Chen, C., Van Gool, L. (2022). OSFormer: One-Stage Camouflaged Instance Segmentation with Transformers. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13678. Springer, Cham. https://doi.org/10.1007/978-3-031-19797-0_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19797-0_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19796-3

  • Online ISBN: 978-3-031-19797-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics