Abstract
Few-shot video object segmentation (FSVOS) is a challenging task that aims to segment new object classes across query videos with limited annotated support images. Typically, meta learner is the main approach to handle few-shot tasks. However, the current meta learner ignores contextual information and lacks the use of temporal information in videos. Moreover, the trained models are biased towards the segmentation of novel classes, favoring the seen class, which hinders the recognition of novel classes. To address these problems, we propose contextual augmentation with bias adaptive for few-shot video object segmentation, consisting of a context augmented learner (CAL) and a bias adaptive learner (BAL). The context augmented learner processes the contextual information in the video and guides the meta learner to obtain rough prediction results. Afterwards, the bias adaptive learner adapts to the bias of the novel classes. The BAL branch utilizes a base class learner to identify the base classes and compute the similarity between the query video and the support set, guiding the adaptive integration of coarse-robust results to generate accurate segmentation. Experiments conducted on the Youtube-VIS dataset demonstrate that our approach achieves state-of-the-art performance.
S. Wang and Z. Liu—Authors contributed equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bao, L., Wu, B., Liu, W.: CNN in MRF: video object segmentation via inference in a CNN-based higher-order spatio-temporal MRF. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Chen, H., Wu, H., Zhao, N., Ren, S., He, S.: Delving deep into many-to-many attention for few-shot video object segmentation. In: Computer Vision and Pattern Recognition (2021)
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018). https://doi.org/10.1109/TPAMI.2017.2699184
Chen, S., Wang, C., Liu, W., Ye, Z., Deng, J.: Pseudo-label diversity exploitation for few-shot object detection. In: Dang-Nguyen, D.T., et al. (eds.) MMM 2023. LNCS, vol. 13834, pp. 289–300. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-27818-1_24
Cheng, G., Li, R., Lang, C., Han, J.: Task-wise attention guided part complementary learning for few-shot image classification. SCIENCE CHINA Inf. Sci. 64(2), 14 (2021)
Cheng, H.K., Schwing, A.G.: XMem: long-term video object segmentation with an Atkinson-Shiffrin memory model. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13688, pp. 640–658. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19815-1_37
Fan, Z., Ma, Y., Li, Z., Sun, J.: Generalized few-shot object detection without forgetting. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Los Alamitos, CA, USA, pp. 4525–4534. IEEE Computer Society (2021)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Jain, S.D., Bo, X., Grauman, K.: Fusionseg: learning to combine motion and appearance for fully automatic segmentation of generic objects in videos. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Khoreva, A., Benenson, R., Ilg, E., Brox, T., Schiele, B.: Lucid data dreaming for object tracking. Int. J. Comput. Vis. 127 (2017)
Khoreva, A., Perazzi, F., Benenson, R., Schiele, B., Sorkine-Hornung, A.: Learning video object segmentation from static images. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Lang, C., Cheng, G., Tu, B., Han, J.: Learning what not to segment: a new perspective on few-shot segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Li, G., Jampani, V., Sevilla-Lara, L., Sun, D., Kim, J., Kim, J.: Adaptive prototype learning and allocation for few-shot segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Li, G., Gong, S., Zhong, S., Zhou, L.: Spatial and temporal guidance for semi-supervised video object segmentation. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds.) ICONIP 2022. LNCS, vol. 13625, pp. 97–109. Springer, Heidelberg (2023). https://doi.org/10.1007/978-3-031-30111-7_9
Li, S., Seybold, B., Vorobyov, A., Fathi, A., Kuo, C.: Instance embedding transfer to unsupervised video object segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Liu, Y., Zhang, X., Zhang, S., He, X.: Part-aware prototype network for few-shot semantic segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 142–158. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_9
Müller, M.: Dynamic time warping. In: Müller, M. (ed.) Information Retrieval for Music and Motion, pp. 69–84. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74048-3_4
Ochs, P., Malik, J., Brox, T.: Segmentation of moving objects by long term video analysis. IEEE Trans. Pattern Anal. Mach. Intell. 36(6), 1187–1200 (2014). https://doi.org/10.1109/TPAMI.2013.242
Shaban, A., Bansal, S., Liu, Z., Essa, I., Boots, B.: One-shot learning for semantic segmentation. In: British Machine Vision Conference 2017 (2017)
Tian, Z., Zhao, H., Shu, M., Yang, Z., Jia, J.: Prior guided feature enrichment network for few-shot segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 44(2), 1050–1065 (2020)
Wang, H., Lian, J., Xiong, S.: Few-shot classification with transductive data clustering transformation. In: Yang, H., Pasupa, K., Leung, A.C.-S., Kwok, J.T., Chan, J.H., King, I. (eds.) ICONIP 2020. LNCS, vol. 12533, pp. 370–380. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-63833-7_31
Yang, B., Liu, C., Li, B., Jiao, J., Ye, Q.: Prototype mixture models for few-shot semantic segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12353, pp. 763–778. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58598-3_45
Yang, L., Fan, Y., Xu, N.: Video instance segmentation. In: International Conference on Computer Vision (2019)
Yong, J.L., Kim, J., Grauman, K.: Key-segments for video object segmentation. In: IEEE International Conference on Computer Vision, ICCV 2011, Barcelona, Spain, 6–13 November 2011 (2011)
Zeng, X., Liao, R., Gu, L., Xiong, Y., Fidler, S., Urtasun, R.: DMM-Net: differentiable mask-matching network for video object segmentation. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2019)
Zhang, C., Lin, G., Liu, F., Yao, R., Shen, C.: CANet: class-agnostic segmentation networks with iterative refinement and attentive few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5217–5226 (2019)
Zhang, X., Wei, Y., Yang, Y., Huang, T.S.: SG-one: similarity guidance network for one-shot semantic segmentation. IEEE Trans. Cybern. 50(9), 3855–3865 (2020)
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. IEEE Computer Society (2016)
Zhao, R., Zhu, K., Cao, Y., Zha, Z.-J.: AS-Net: class-aware assistance and suppression network for few-shot learning. In: Þór Jónsson, B., et al. (eds.) MMM 2022. LNCS, vol. 13142, pp. 27–39. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-98355-0_3
Acknowledgements
This work was supported in part by Zhejiang Provincial Natural Science Foundation of China (No. LDT23F0202, No. LDT23F02021F02, No. LQ22F020013) and the National Natural Science Foundation of China (No. 62036009, No. 62106226).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, S. et al. (2024). Contextual Augmentation with Bias Adaptive for Few-Shot Video Object Segmentation. In: Rudinac, S., et al. MultiMedia Modeling. MMM 2024. Lecture Notes in Computer Science, vol 14554. Springer, Cham. https://doi.org/10.1007/978-3-031-53305-1_27
Download citation
DOI: https://doi.org/10.1007/978-3-031-53305-1_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53304-4
Online ISBN: 978-3-031-53305-1
eBook Packages: Computer ScienceComputer Science (R0)