Generalizable Fourier Augmentation for Unsupervised Video Object Segmentation

Authors

  • Huihui Song B-DAT and CICAEET, Nanjing University of Information Science and Technology, Nanjing, China
  • Tiankang Su B-DAT and CICAEET, Nanjing University of Information Science and Technology, Nanjing, China
  • Yuhui Zheng B-DAT and CICAEET, Nanjing University of Information Science and Technology, Nanjing, Chin College of Computer, Qinghai Normal University, Xining 810016, China
  • Kaihua Zhang B-DAT and CICAEET, Nanjing University of Information Science and Technology, Nanjing, China
  • Bo Liu Walmart Global Tech, Sunnyvale, CA, 94086, USA
  • Dong Liu Netflix Inc, Los Gatos, CA, 95032, USA

DOI:

https://doi.org/10.1609/aaai.v38i5.28295

Keywords:

CV: Segmentation, ML: Unsupervised & Self-Supervised Learning

Abstract

The performance of existing unsupervised video object segmentation methods typically suffers from severe performance degradation on test videos when tested in out-of-distribution scenarios. The primary reason is that the test data in real- world may not follow the independent and identically distribution (i.i.d.) assumption, leading to domain shift. In this paper, we propose a generalizable fourier augmentation method during training to improve the generalization ability of the model. To achieve this, we perform Fast Fourier Transform (FFT) over the intermediate spatial domain features in each layer to yield corresponding frequency representations, including amplitude components (encoding scene-aware styles such as texture, color, contrast of the scene) and phase components (encoding rich semantics). We produce a variety of style features via Gaussian sampling to augment the training data, thereby improving the generalization capability of the model. To further improve the cross-domain generalization performance of the model, we design a phase feature update strategy via exponential moving average using phase features from past frames in an online update manner, which could help the model to learn cross-domain-invariant features. Extensive experiments show that our proposed method achieves the state-of-the-art performance on popular benchmarks.

Downloads

Published

2024-03-24

How to Cite

Song, H., Su, T., Zheng, Y., Zhang, K., Liu, B., & Liu, D. (2024). Generalizable Fourier Augmentation for Unsupervised Video Object Segmentation. Proceedings of the AAAI Conference on Artificial Intelligence, 38(5), 4918-4924. https://doi.org/10.1609/aaai.v38i5.28295

Issue

Section

AAAI Technical Track on Computer Vision IV