Contextual Augmentation with Bias Adaptive for Few-Shot Video Object Segmentation

Wang, Shuaiwei; Liu, Zhao; Lei, Jie; Feng, Zunlei; Xu, Juan; Li, Xuan; Liang, Ronghua

doi:10.1007/978-3-031-53305-1_27

Shuaiwei Wang¹⁴,
Zhao Liu¹⁶,
Jie Lei¹⁴,
Zunlei Feng¹⁵,
Juan Xu¹⁶,
Xuan Li¹⁶ &
…
Ronghua Liang¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14554))

Included in the following conference series:

International Conference on Multimedia Modeling

337 Accesses

Abstract

Few-shot video object segmentation (FSVOS) is a challenging task that aims to segment new object classes across query videos with limited annotated support images. Typically, meta learner is the main approach to handle few-shot tasks. However, the current meta learner ignores contextual information and lacks the use of temporal information in videos. Moreover, the trained models are biased towards the segmentation of novel classes, favoring the seen class, which hinders the recognition of novel classes. To address these problems, we propose contextual augmentation with bias adaptive for few-shot video object segmentation, consisting of a context augmented learner (CAL) and a bias adaptive learner (BAL). The context augmented learner processes the contextual information in the video and guides the meta learner to obtain rough prediction results. Afterwards, the bias adaptive learner adapts to the bias of the novel classes. The BAL branch utilizes a base class learner to identify the base classes and compute the similarity between the query video and the support set, guiding the adaptive integration of coarse-robust results to generate accurate segmentation. Experiments conducted on the Youtube-VIS dataset demonstrate that our approach achieves state-of-the-art performance.

S. Wang and Z. Liu—Authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bao, L., Wu, B., Liu, W.: CNN in MRF: video object segmentation via inference in a CNN-based higher-order spatio-temporal MRF. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Chen, H., Wu, H., Zhao, N., Ren, S., He, S.: Delving deep into many-to-many attention for few-shot video object segmentation. In: Computer Vision and Pattern Recognition (2021)
Google Scholar
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018). https://doi.org/10.1109/TPAMI.2017.2699184
Article Google Scholar
Chen, S., Wang, C., Liu, W., Ye, Z., Deng, J.: Pseudo-label diversity exploitation for few-shot object detection. In: Dang-Nguyen, D.T., et al. (eds.) MMM 2023. LNCS, vol. 13834, pp. 289–300. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-27818-1_24
Chapter Google Scholar
Cheng, G., Li, R., Lang, C., Han, J.: Task-wise attention guided part complementary learning for few-shot image classification. SCIENCE CHINA Inf. Sci. 64(2), 14 (2021)
Article Google Scholar
Cheng, H.K., Schwing, A.G.: XMem: long-term video object segmentation with an Atkinson-Shiffrin memory model. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13688, pp. 640–658. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19815-1_37
Chapter Google Scholar
Fan, Z., Ma, Y., Li, Z., Sun, J.: Generalized few-shot object detection without forgetting. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Los Alamitos, CA, USA, pp. 4525–4534. IEEE Computer Society (2021)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Jain, S.D., Bo, X., Grauman, K.: Fusionseg: learning to combine motion and appearance for fully automatic segmentation of generic objects in videos. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Khoreva, A., Benenson, R., Ilg, E., Brox, T., Schiele, B.: Lucid data dreaming for object tracking. Int. J. Comput. Vis. 127 (2017)
Google Scholar
Khoreva, A., Perazzi, F., Benenson, R., Schiele, B., Sorkine-Hornung, A.: Learning video object segmentation from static images. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Lang, C., Cheng, G., Tu, B., Han, J.: Learning what not to segment: a new perspective on few-shot segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Google Scholar
Li, G., Jampani, V., Sevilla-Lara, L., Sun, D., Kim, J., Kim, J.: Adaptive prototype learning and allocation for few-shot segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Google Scholar
Li, G., Gong, S., Zhong, S., Zhou, L.: Spatial and temporal guidance for semi-supervised video object segmentation. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds.) ICONIP 2022. LNCS, vol. 13625, pp. 97–109. Springer, Heidelberg (2023). https://doi.org/10.1007/978-3-031-30111-7_9
Chapter Google Scholar
Li, S., Seybold, B., Vorobyov, A., Fathi, A., Kuo, C.: Instance embedding transfer to unsupervised video object segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Liu, Y., Zhang, X., Zhang, S., He, X.: Part-aware prototype network for few-shot semantic segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 142–158. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_9
Chapter Google Scholar
Müller, M.: Dynamic time warping. In: Müller, M. (ed.) Information Retrieval for Music and Motion, pp. 69–84. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74048-3_4
Chapter Google Scholar
Ochs, P., Malik, J., Brox, T.: Segmentation of moving objects by long term video analysis. IEEE Trans. Pattern Anal. Mach. Intell. 36(6), 1187–1200 (2014). https://doi.org/10.1109/TPAMI.2013.242
Article Google Scholar
Shaban, A., Bansal, S., Liu, Z., Essa, I., Boots, B.: One-shot learning for semantic segmentation. In: British Machine Vision Conference 2017 (2017)
Google Scholar
Tian, Z., Zhao, H., Shu, M., Yang, Z., Jia, J.: Prior guided feature enrichment network for few-shot segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 44(2), 1050–1065 (2020)
Article Google Scholar
Wang, H., Lian, J., Xiong, S.: Few-shot classification with transductive data clustering transformation. In: Yang, H., Pasupa, K., Leung, A.C.-S., Kwok, J.T., Chan, J.H., King, I. (eds.) ICONIP 2020. LNCS, vol. 12533, pp. 370–380. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-63833-7_31
Chapter Google Scholar
Yang, B., Liu, C., Li, B., Jiao, J., Ye, Q.: Prototype mixture models for few-shot semantic segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12353, pp. 763–778. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58598-3_45
Chapter Google Scholar
Yang, L., Fan, Y., Xu, N.: Video instance segmentation. In: International Conference on Computer Vision (2019)
Google Scholar
Yong, J.L., Kim, J., Grauman, K.: Key-segments for video object segmentation. In: IEEE International Conference on Computer Vision, ICCV 2011, Barcelona, Spain, 6–13 November 2011 (2011)
Google Scholar
Zeng, X., Liao, R., Gu, L., Xiong, Y., Fidler, S., Urtasun, R.: DMM-Net: differentiable mask-matching network for video object segmentation. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Zhang, C., Lin, G., Liu, F., Yao, R., Shen, C.: CANet: class-agnostic segmentation networks with iterative refinement and attentive few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5217–5226 (2019)
Google Scholar
Zhang, X., Wei, Y., Yang, Y., Huang, T.S.: SG-one: similarity guidance network for one-shot semantic segmentation. IEEE Trans. Cybern. 50(9), 3855–3865 (2020)
Google Scholar
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. IEEE Computer Society (2016)
Google Scholar
Zhao, R., Zhu, K., Cao, Y., Zha, Z.-J.: AS-Net: class-aware assistance and suppression network for few-shot learning. In: Þór Jónsson, B., et al. (eds.) MMM 2022. LNCS, vol. 13142, pp. 27–39. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-98355-0_3
Chapter Google Scholar

Download references

Acknowledgements

This work was supported in part by Zhejiang Provincial Natural Science Foundation of China (No. LDT23F0202, No. LDT23F02021F02, No. LQ22F020013) and the National Natural Science Foundation of China (No. 62036009, No. 62106226).

Author information

Authors and Affiliations

College of Computer Science, Zhejiang University of Technology, Hangzhou, China
Shuaiwei Wang, Jie Lei & Ronghua Liang
College of Computer Science, Zhejiang University, Hangzhou, China
Zunlei Feng
Ping An Life Insurance of China, Ltd., Shanghai, China
Zhao Liu, Juan Xu & Xuan Li

Authors

Shuaiwei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jie Lei
View author publications
You can also search for this author in PubMed Google Scholar
Zunlei Feng
View author publications
You can also search for this author in PubMed Google Scholar
Juan Xu
View author publications
You can also search for this author in PubMed Google Scholar
Xuan Li
View author publications
You can also search for this author in PubMed Google Scholar
Ronghua Liang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jie Lei .

Editor information

Editors and Affiliations

University of Amsterdam, Amsterdam, The Netherlands
Stevan Rudinac
Delft University of Technology, Delft, The Netherlands
Alan Hanjalic
Delft University of Technology, Delft, The Netherlands
Cynthia Liem
University of Amsterdam, Amsterdam, The Netherlands
Marcel Worring
Reykjavik University, Reykjavik, Iceland
Björn Þór Jónsson
Microsoft Research Lab – Asia, Beijing, China
Bei Liu
The University of Tokyo, Tokyo, Japan
Yoko Yamakata

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, S. et al. (2024). Contextual Augmentation with Bias Adaptive for Few-Shot Video Object Segmentation. In: Rudinac, S., et al. MultiMedia Modeling. MMM 2024. Lecture Notes in Computer Science, vol 14554. Springer, Cham. https://doi.org/10.1007/978-3-031-53305-1_27

Download citation

DOI: https://doi.org/10.1007/978-3-031-53305-1_27
Published: 28 January 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53304-4
Online ISBN: 978-3-031-53305-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Contextual Augmentation with Bias Adaptive for Few-Shot Video Object Segmentation