A Generalized Framework for Video Instance Segmentation

Heo, Miran; Hwang, Sukjun; Hyun, Jeongseok; Kim, Hanjung; Oh, Seoung Wug; Lee, Joon-Young; Kim, Seon Joo

Computer Science > Computer Vision and Pattern Recognition

arXiv:2211.08834 (cs)

[Submitted on 16 Nov 2022 (v1), last revised 24 Mar 2023 (this version, v2)]

Title:A Generalized Framework for Video Instance Segmentation

Authors:Miran Heo, Sukjun Hwang, Jeongseok Hyun, Hanjung Kim, Seoung Wug Oh, Joon-Young Lee, Seon Joo Kim

View PDF

Abstract:The handling of long videos with complex and occluded sequences has recently emerged as a new challenge in the video instance segmentation (VIS) community. However, existing methods have limitations in addressing this challenge. We argue that the biggest bottleneck in current approaches is the discrepancy between training and inference. To effectively bridge this gap, we propose a Generalized framework for VIS, namely GenVIS, that achieves state-of-the-art performance on challenging benchmarks without designing complicated architectures or requiring extra post-processing. The key contribution of GenVIS is the learning strategy, which includes a query-based training pipeline for sequential learning with a novel target label assignment. Additionally, we introduce a memory that effectively acquires information from previous states. Thanks to the new perspective, which focuses on building relationships between separate frames or clips, GenVIS can be flexibly executed in both online and semi-online manner. We evaluate our approach on popular VIS benchmarks, achieving state-of-the-art results on YouTube-VIS 2019/2021/2022 and Occluded VIS (OVIS). Notably, we greatly outperform the state-of-the-art on the long VIS benchmark (OVIS), improving 5.6 AP with ResNet-50 backbone. Code is available at this https URL.

Comments:	CVPR 2023
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2211.08834 [cs.CV]
	(or arXiv:2211.08834v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2211.08834

Submission history

From: Miran Heo [view email]
[v1] Wed, 16 Nov 2022 11:17:19 UTC (8,481 KB)
[v2] Fri, 24 Mar 2023 15:26:13 UTC (14,044 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:A Generalized Framework for Video Instance Segmentation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:A Generalized Framework for Video Instance Segmentation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators