Generalizable Fourier Augmentation for Unsupervised Video Object Segmentation

Huihui Song; Tiankang Su; Yuhui Zheng; Kaihua Zhang; Bo Liu; Dong Liu

doi:10.1609/aaai.v38i5.28295

Authors

Huihui Song B-DAT and CICAEET, Nanjing University of Information Science and Technology, Nanjing, China
Tiankang Su B-DAT and CICAEET, Nanjing University of Information Science and Technology, Nanjing, China
Yuhui Zheng B-DAT and CICAEET, Nanjing University of Information Science and Technology, Nanjing, Chin College of Computer, Qinghai Normal University, Xining 810016, China
Kaihua Zhang B-DAT and CICAEET, Nanjing University of Information Science and Technology, Nanjing, China
Bo Liu Walmart Global Tech, Sunnyvale, CA, 94086, USA
Dong Liu Netflix Inc, Los Gatos, CA, 95032, USA

DOI:

https://doi.org/10.1609/aaai.v38i5.28295

Keywords:

CV: Segmentation, ML: Unsupervised & Self-Supervised Learning

Abstract

The performance of existing unsupervised video object segmentation methods typically suffers from severe performance degradation on test videos when tested in out-of-distribution scenarios. The primary reason is that the test data in real- world may not follow the independent and identically distribution (i.i.d.) assumption, leading to domain shift. In this paper, we propose a generalizable fourier augmentation method during training to improve the generalization ability of the model. To achieve this, we perform Fast Fourier Transform (FFT) over the intermediate spatial domain features in each layer to yield corresponding frequency representations, including amplitude components (encoding scene-aware styles such as texture, color, contrast of the scene) and phase components (encoding rich semantics). We produce a variety of style features via Gaussian sampling to augment the training data, thereby improving the generalization capability of the model. To further improve the cross-domain generalization performance of the model, we design a phase feature update strategy via exponential moving average using phase features from past frames in an online update manner, which could help the model to learn cross-domain-invariant features. Extensive experiments show that our proposed method achieves the state-of-the-art performance on popular benchmarks.

Generalizable Fourier Augmentation for Unsupervised Video Object Segmentation

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription