Skip to main content

Contextual Augmentation with Bias Adaptive for Few-Shot Video Object Segmentation

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14554))

Included in the following conference series:

  • 337 Accesses

Abstract

Few-shot video object segmentation (FSVOS) is a challenging task that aims to segment new object classes across query videos with limited annotated support images. Typically, meta learner is the main approach to handle few-shot tasks. However, the current meta learner ignores contextual information and lacks the use of temporal information in videos. Moreover, the trained models are biased towards the segmentation of novel classes, favoring the seen class, which hinders the recognition of novel classes. To address these problems, we propose contextual augmentation with bias adaptive for few-shot video object segmentation, consisting of a context augmented learner (CAL) and a bias adaptive learner (BAL). The context augmented learner processes the contextual information in the video and guides the meta learner to obtain rough prediction results. Afterwards, the bias adaptive learner adapts to the bias of the novel classes. The BAL branch utilizes a base class learner to identify the base classes and compute the similarity between the query video and the support set, guiding the adaptive integration of coarse-robust results to generate accurate segmentation. Experiments conducted on the Youtube-VIS dataset demonstrate that our approach achieves state-of-the-art performance.

S. Wang and Z. Liu—Authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bao, L., Wu, B., Liu, W.: CNN in MRF: video object segmentation via inference in a CNN-based higher-order spatio-temporal MRF. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

    Google Scholar 

  2. Chen, H., Wu, H., Zhao, N., Ren, S., He, S.: Delving deep into many-to-many attention for few-shot video object segmentation. In: Computer Vision and Pattern Recognition (2021)

    Google Scholar 

  3. Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018). https://doi.org/10.1109/TPAMI.2017.2699184

    Article  Google Scholar 

  4. Chen, S., Wang, C., Liu, W., Ye, Z., Deng, J.: Pseudo-label diversity exploitation for few-shot object detection. In: Dang-Nguyen, D.T., et al. (eds.) MMM 2023. LNCS, vol. 13834, pp. 289–300. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-27818-1_24

    Chapter  Google Scholar 

  5. Cheng, G., Li, R., Lang, C., Han, J.: Task-wise attention guided part complementary learning for few-shot image classification. SCIENCE CHINA Inf. Sci. 64(2), 14 (2021)

    Article  Google Scholar 

  6. Cheng, H.K., Schwing, A.G.: XMem: long-term video object segmentation with an Atkinson-Shiffrin memory model. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13688, pp. 640–658. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19815-1_37

    Chapter  Google Scholar 

  7. Fan, Z., Ma, Y., Li, Z., Sun, J.: Generalized few-shot object detection without forgetting. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Los Alamitos, CA, USA, pp. 4525–4534. IEEE Computer Society (2021)

    Google Scholar 

  8. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Google Scholar 

  9. Jain, S.D., Bo, X., Grauman, K.: Fusionseg: learning to combine motion and appearance for fully automatic segmentation of generic objects in videos. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  10. Khoreva, A., Benenson, R., Ilg, E., Brox, T., Schiele, B.: Lucid data dreaming for object tracking. Int. J. Comput. Vis. 127 (2017)

    Google Scholar 

  11. Khoreva, A., Perazzi, F., Benenson, R., Schiele, B., Sorkine-Hornung, A.: Learning video object segmentation from static images. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  12. Lang, C., Cheng, G., Tu, B., Han, J.: Learning what not to segment: a new perspective on few-shot segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

    Google Scholar 

  13. Li, G., Jampani, V., Sevilla-Lara, L., Sun, D., Kim, J., Kim, J.: Adaptive prototype learning and allocation for few-shot segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

    Google Scholar 

  14. Li, G., Gong, S., Zhong, S., Zhou, L.: Spatial and temporal guidance for semi-supervised video object segmentation. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds.) ICONIP 2022. LNCS, vol. 13625, pp. 97–109. Springer, Heidelberg (2023). https://doi.org/10.1007/978-3-031-30111-7_9

    Chapter  Google Scholar 

  15. Li, S., Seybold, B., Vorobyov, A., Fathi, A., Kuo, C.: Instance embedding transfer to unsupervised video object segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

    Google Scholar 

  16. Liu, Y., Zhang, X., Zhang, S., He, X.: Part-aware prototype network for few-shot semantic segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 142–158. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_9

    Chapter  Google Scholar 

  17. Müller, M.: Dynamic time warping. In: Müller, M. (ed.) Information Retrieval for Music and Motion, pp. 69–84. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74048-3_4

    Chapter  Google Scholar 

  18. Ochs, P., Malik, J., Brox, T.: Segmentation of moving objects by long term video analysis. IEEE Trans. Pattern Anal. Mach. Intell. 36(6), 1187–1200 (2014). https://doi.org/10.1109/TPAMI.2013.242

    Article  Google Scholar 

  19. Shaban, A., Bansal, S., Liu, Z., Essa, I., Boots, B.: One-shot learning for semantic segmentation. In: British Machine Vision Conference 2017 (2017)

    Google Scholar 

  20. Tian, Z., Zhao, H., Shu, M., Yang, Z., Jia, J.: Prior guided feature enrichment network for few-shot segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 44(2), 1050–1065 (2020)

    Article  Google Scholar 

  21. Wang, H., Lian, J., Xiong, S.: Few-shot classification with transductive data clustering transformation. In: Yang, H., Pasupa, K., Leung, A.C.-S., Kwok, J.T., Chan, J.H., King, I. (eds.) ICONIP 2020. LNCS, vol. 12533, pp. 370–380. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-63833-7_31

    Chapter  Google Scholar 

  22. Yang, B., Liu, C., Li, B., Jiao, J., Ye, Q.: Prototype mixture models for few-shot semantic segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12353, pp. 763–778. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58598-3_45

    Chapter  Google Scholar 

  23. Yang, L., Fan, Y., Xu, N.: Video instance segmentation. In: International Conference on Computer Vision (2019)

    Google Scholar 

  24. Yong, J.L., Kim, J., Grauman, K.: Key-segments for video object segmentation. In: IEEE International Conference on Computer Vision, ICCV 2011, Barcelona, Spain, 6–13 November 2011 (2011)

    Google Scholar 

  25. Zeng, X., Liao, R., Gu, L., Xiong, Y., Fidler, S., Urtasun, R.: DMM-Net: differentiable mask-matching network for video object segmentation. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2019)

    Google Scholar 

  26. Zhang, C., Lin, G., Liu, F., Yao, R., Shen, C.: CANet: class-agnostic segmentation networks with iterative refinement and attentive few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5217–5226 (2019)

    Google Scholar 

  27. Zhang, X., Wei, Y., Yang, Y., Huang, T.S.: SG-one: similarity guidance network for one-shot semantic segmentation. IEEE Trans. Cybern. 50(9), 3855–3865 (2020)

    Google Scholar 

  28. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. IEEE Computer Society (2016)

    Google Scholar 

  29. Zhao, R., Zhu, K., Cao, Y., Zha, Z.-J.: AS-Net: class-aware assistance and suppression network for few-shot learning. In: Þór Jónsson, B., et al. (eds.) MMM 2022. LNCS, vol. 13142, pp. 27–39. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-98355-0_3

    Chapter  Google Scholar 

Download references

Acknowledgements

This work was supported in part by Zhejiang Provincial Natural Science Foundation of China (No. LDT23F0202, No. LDT23F02021F02, No. LQ22F020013) and the National Natural Science Foundation of China (No. 62036009, No. 62106226).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jie Lei .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, S. et al. (2024). Contextual Augmentation with Bias Adaptive for Few-Shot Video Object Segmentation. In: Rudinac, S., et al. MultiMedia Modeling. MMM 2024. Lecture Notes in Computer Science, vol 14554. Springer, Cham. https://doi.org/10.1007/978-3-031-53305-1_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-53305-1_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-53304-4

  • Online ISBN: 978-3-031-53305-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics