Delving into CLIP latent space for Video Anomaly Recognition

Zanella, Luca; Liberatori, Benedetta; Menapace, Willi; Poiesi, Fabio; Wang, Yiming; Ricci, Elisa

Computer Science > Computer Vision and Pattern Recognition

arXiv:2310.02835 (cs)

[Submitted on 4 Oct 2023]

Title:Delving into CLIP latent space for Video Anomaly Recognition

Authors:Luca Zanella, Benedetta Liberatori, Willi Menapace, Fabio Poiesi, Yiming Wang, Elisa Ricci

View PDF

Abstract:We tackle the complex problem of detecting and recognising anomalies in surveillance videos at the frame level, utilising only video-level supervision. We introduce the novel method AnomalyCLIP, the first to combine Large Language and Vision (LLV) models, such as CLIP, with multiple instance learning for joint video anomaly detection and classification. Our approach specifically involves manipulating the latent CLIP feature space to identify the normal event subspace, which in turn allows us to effectively learn text-driven directions for abnormal events. When anomalous frames are projected onto these directions, they exhibit a large feature magnitude if they belong to a particular class. We also introduce a computationally efficient Transformer architecture to model short- and long-term temporal dependencies between frames, ultimately producing the final anomaly score and class prediction probabilities. We compare AnomalyCLIP against state-of-the-art methods considering three major anomaly detection benchmarks, i.e. ShanghaiTech, UCF-Crime, and XD-Violence, and empirically show that it outperforms baselines in recognising video anomalies.

Comments:	submitted to Computer Vision and Image Understanding, project website and code are available at this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2310.02835 [cs.CV]
	(or arXiv:2310.02835v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2310.02835

Submission history

From: Luca Zanella [view email]
[v1] Wed, 4 Oct 2023 14:01:55 UTC (7,851 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Delving into CLIP latent space for Video Anomaly Recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Delving into CLIP latent space for Video Anomaly Recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators