PEEKABOO: Interactive Video Generation via Masked-Diffusion

Jain, Yash; Nasery, Anshul; Vineet, Vibhav; Behl, Harkirat

Computer Science > Computer Vision and Pattern Recognition

arXiv:2312.07509 (cs)

[Submitted on 12 Dec 2023 (v1), last revised 19 Apr 2024 (this version, v2)]

Title:PEEKABOO: Interactive Video Generation via Masked-Diffusion

Authors:Yash Jain, Anshul Nasery, Vibhav Vineet, Harkirat Behl

View PDF

Abstract:Modern video generation models like Sora have achieved remarkable success in producing high-quality videos. However, a significant limitation is their inability to offer interactive control to users, a feature that promises to open up unprecedented applications and creativity. In this work, we introduce the first solution to equip diffusion-based video generation models with spatio-temporal control. We present Peekaboo, a novel masked attention module, which seamlessly integrates with current video generation models offering control without the need for additional training or inference overhead. To facilitate future research, we also introduce a comprehensive benchmark for interactive video generation. This benchmark offers a standardized framework for the community to assess the efficacy of emerging interactive video generation models. Our extensive qualitative and quantitative assessments reveal that Peekaboo achieves up to a 3.8x improvement in mIoU over baseline models, all while maintaining the same latency. Code and benchmark are available on the webpage.

Comments:	Project webpage - this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2312.07509 [cs.CV]
	(or arXiv:2312.07509v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2312.07509

Submission history

From: Yash Jain [view email]
[v1] Tue, 12 Dec 2023 18:43:05 UTC (19,593 KB)
[v2] Fri, 19 Apr 2024 22:38:48 UTC (27,295 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:PEEKABOO: Interactive Video Generation via Masked-Diffusion

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:PEEKABOO: Interactive Video Generation via Masked-Diffusion

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators