Taylor Videos for Action Recognition

Wang, Lei; Yuan, Xiuyuan; Gedeon, Tom; Zheng, Liang

Computer Science > Computer Vision and Pattern Recognition

arXiv:2402.03019 (cs)

[Submitted on 5 Feb 2024 (v1), last revised 8 Feb 2024 (this version, v3)]

Title:Taylor Videos for Action Recognition

Authors:Lei Wang, Xiuyuan Yuan, Tom Gedeon, Liang Zheng

View PDF HTML (experimental)

Abstract:Effectively extracting motions from video is a critical and long-standing problem for action recognition. This problem is very challenging because motions (i) do not have an explicit form, (ii) have various concepts such as displacement, velocity, and acceleration, and (iii) often contain noise caused by unstable pixels. Addressing these challenges, we propose the Taylor video, a new video format that highlights the dominate motions (e.g., a waving hand) in each of its frames named the Taylor frame. Taylor video is named after Taylor series, which approximates a function at a given point using important terms. In the scenario of videos, we define an implicit motion-extraction function which aims to extract motions from video temporal block. In this block, using the frames, the difference frames, and higher-order difference frames, we perform Taylor expansion to approximate this function at the starting frame. We show the summation of the higher-order terms in the Taylor series gives us dominant motion patterns, where static objects, small and unstable motions are removed. Experimentally we show that Taylor videos are effective inputs to popular architectures including 2D CNNs, 3D CNNs, and transformers. When used individually, Taylor videos yield competitive action recognition accuracy compared to RGB videos and optical flow. When fused with RGB or optical flow videos, further accuracy improvement is achieved.

Comments:	Research report
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2402.03019 [cs.CV]
	(or arXiv:2402.03019v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2402.03019

Submission history

From: Lei Wang [view email]
[v1] Mon, 5 Feb 2024 14:00:13 UTC (34,028 KB)
[v2] Wed, 7 Feb 2024 05:50:11 UTC (34,028 KB)
[v3] Thu, 8 Feb 2024 02:34:21 UTC (34,029 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Taylor Videos for Action Recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Taylor Videos for Action Recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators