Key frame extraction based on quaternion Fourier transform with multiple features fusion
Introduction
With the development of the internet, network transmission, and information technology, the cost of information transmission is decreasing (Li, Yuan, Chen, Quaternion, Quaternion, & Decomposition, 2020). People are no longer satisfied with communicating through traditional text, voice, and pictures; instead, they prefer more vivid and intuitive expressions, such as video communication (Persia F, D'Auria D, Pilato G. An Overview of Video Surveillance Approaches. 2020 IEEE 14th International Conference on Semantic Computing (ICSC) 2020, 2020). While technological progress brings convenience to people, the problem of substantial growth in video data volume comes along, especially in video surveillance. Maintaining public safety as well as social stability remains a challenging task. With the popularity of surveillance equipment, video surveillance systems play an irreplaceable role in maintaining public safety and order. Therefore, quickly browsing the surveillance video and accurately identifying the insecurity factors become the top priority of the current task (Abdalla et al., 2019, Satya Krishna et al., 2022). Furthermore, due to the non-stop work throughout the day, video data traffic shows an exponential growth trend (Fei et al., 2021, Javier Traver et al., 2022). Simultaneously, surveillance videos captured by a single device can contain many redundant contents due to problems such as no outstanding structural features and a fixed background (Cedillo-Hernandez et al., 2021, Chandrakala et al., 2022). Therefore, the key frame extraction technique, which expresses the main content of the footage by extracting some frames in the surveillance video, has attracted extensive attention from researchers (Baek et al., 2021, Li and Zhou, 2021, Wang et al., 2013).
A key frame is a collection of images that reflects the core content of a video in a frame or sequence of frames (Huang and Wang, 2020, Li et al., 2018). Key frame extraction techniques are critically important in handling a large amount of redundant video data, improving the efficiency of video retrieval, and enhancing the precision of video retrieval. The existing key frame extraction techniques can be categorized into five types: (1) video key frame extraction based on shot boundaries (Wang, Guo, Wu, & Li, 2015), (2) video key frame extraction based on motion analysis (Zhu, Loy, & Gong, 2016), (3) video key frame extraction based on shot features (Lee & Grauman, 2015), (4) video key frame extraction based on clustering method (Zhang, 2020), (5) video key frame extraction based on deep learning(Wang, Bai, & Wu, 2021). Additionally, Yuan et al. proposed a new method of key frame extraction based on multi-feature adaptive threshold detection and then used deep learning to train the feature data to obtain gait sequences (Yuan, Xiao, & Li, 2015). Huang proposed a novel method based on the Phase Spectrum of Quaternion Fourier Transform (PQFT) for anomaly detection in the crowd (Wang et al., 2015). The method uses quaternion feature representation for Spatiotemporal anomaly detection, and then reconstructs the final anomaly saliency map using inverse QFT. The method can efficiently and robustly detect unusual events and satisfies real-time requirements. Guo et al. extended the image saliency detection method PQFT to the video field (Guo, Ma, & Zhang, 2008). They added motion features to quaternion features so that the method can be applied to videos. The added motion features are derived from the contrast difference between two related frames. Additionally, associated methods that use multiple feature fusion to perform target recognition on images have also achieved significant results (Huang and Jing, 2020, Mao et al., 2019, Zhang et al., 2019).
The existing key frame extraction methods usually operate on video sequences after grayscale preprocessing, which destroys the integrity of image information (Wang, 2020). In addition, current key frame extraction methods for surveillance video suffer from problems such as incomplete target detail extraction and inaccurate local action judgment (Zhang & Yang, 2014). To this end, this paper proposes a key frame extraction method for surveillance video based on quaternion Fourier transform with multiple feature fusion. Because the static features contain opposing color neuron features that represent the color information of the image, as well as the brightness features that respond to the subjective brightness perception of humans, the complete information of the image information is preserved. In addition, the spectrum in the quaternion Fourier transform contains more information about the image texture, making it possible to represent the local details of the target more clearly and unambiguously. Immediately after, the information containing the overall structure of the image is filtered out by Gaussian filtering, and the quaternion Fourier inverse transform is applied to it to obtain the fused feature map. Finally, this paper constructs an adaptive key frame filtering criterion to accurately extract key frames of surveillance video by characterizing the fusion feature map of two adjacent frames.
Section snippets
Quaternion
The Quaternion can also be called super-complex, an extended form of the complex number. Hamilton initially proposed Quaternion, and then Shoemake introduced this theory into computer graphics (Bill et al., 2021, Wang et al., 2019). A quaternion q contains four parts, which are a real part and three imaginary parts. The form of expression is reproduced below (Shi & Funt, 2007):
among them, are real number, which represents the real part; are three imaginary
Algorithm describe
This paper uses a quadratic Fourier algorithm to extract features from the whole image, that is, dynamic feature and static features, followed by image fusion of these features. The method proposed in this paper aims to extract the overall structure of the image while capturing the changes in image details and therefore requires the extraction and fusion of multi-angle features to obtain complete and transparent information on the saliency of the target. The image contour information can be
Experimental and results analysis
This paper conducts experiments on the key frame extraction method for surveillance video based on quaternion Fourier transform with multiple feature fusion from subjective and objective perspectives.
The experimental results indicate that the correctness and effectiveness of the method proposed in this paper is verified. This experiment uses AMD Ryzen 7 4800U with Radeon Graphics 1.80 GHz processor. The operating system is 64-bit Windows 10 Professional Edition. Since the proposed method in
Conclusion
In order to solve the problem that the existing key frame extraction method for surveillance video cannot guarantee the integrity of image information and cannot accurately extract the local motion state of the target, this paper proposes a key frame extraction method for surveillance video based on quaternion Fourier transform with multiple feature fusion. The method first extracts the dynamic and static features in the color surveillance video sequence. Then it obtains their fused phase
CRediT authorship contribution statement
Yunzuo Zhang: Conceptualization, Supervision, Project administration, Funding acquisition, Writing – review & editing. Jiayu Zhang: Conceptualization, Methodology, Investigation, Data curation, Writing – original draft, Writing – review & editing. Ruixue Liu: Software, Data curation, Validation. Pengfei Zhu: Conceptualization, Investigation, Writing – review & editing. Yameng Liu: Conceptualization, Resources, Writing – review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work is jointly supported by the National Natural Science Foundation of China (No.61702347, No.62027801), the Natural Science Foundation of Hebei Province (No.F2022210007, No.F2017210161), the Science and Technology Project of Hebei Education Department (No.ZD2022100, No.QN2017132), the Central Guidance on Local Science and Technology Development Fund (No.226Z0501G), the Shijiazhuang Tiedao University Graduate Innovation Funding Project (YC2022051).
References (39)
- et al.
Robust hashing for image authentication using quaternion discrete fourier transform and log-polar transform
Digital Signal Processing
(2015) - et al.
Quaternion color texture segmentation
Computer Vision and Image Understanding
(2007) - et al.
Multi-channel versus quaternion orthogonal rotation invariant moments for color image representation
Digital Signal Processing
(2018) - et al.
Modelling perceptions on the evaluation of video summarization
Expert Systems with Applications
(2019) - et al.
Pedestrian gender recognition by style transfer of visible-light image to infrared-light image based on an attention-guided generative adversarial network
Mathematics
(2021) - et al.
Meta-heuristic optimization methods for quaternion-valued neural networks
Mathematics
(2021) - et al.
Improving DFT-based image watermarking using particle swarm optimization algorithm
Mathematics
(2021) - Chandrakala S, Deepak K, Vignesh L. Bag-of-Event-Models based embeddings for detecting anomalies in surveillance...
- et al.
Full 4-D quaternion discrete fourier transform based watermarking for color images
Digital Signal Processing
(2014) - et al.
Video key frame extraction based on scale and direction analysis
The Journal of Engineering
(2022)
Color constancy using double-opponency
IEEE Trans Pattern Anal Mach Intell.
Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform
IEEE Conference on Computer Vision and Pattern Recognition
A novel key-frames selection framework for comprehensive video summarization
IEEE Transactions on Circuits and Systems for Video Technology
Predicting important objects for egocentric video summarization
Int J Comput Vis
Construction of network security situation indicator system for video private network
Journal of Beijing University of Aeronautics and Astronautics
Cited by (0)
- 1
ORCID: 0000-0001-7499-4835.