Key frame extraction based on quaternion Fourier transform with multiple features fusion

https://doi.org/10.1016/j.eswa.2022.119467Get rights and content

Highlights

  • A novel key frame extraction method by QFT with multi-feature fusion is proposed.

  • A method preserving the integrity of image information intact is proposed.

  • A new adaptive key frame selection criterion is constructed.

  • Our method outperforms dominant methods, especially in extracting local details.

Abstract

Key frame extraction, an important technique to improve video viewing efficiency and reduce video redundancy, has attracted considerable attention. Notwithstanding, existing key frame extraction for surveillance video not only fails to extract the local detail information of the target accurately but also preprocess the image with grayscale, so the integrity of the image information is also damaged. In order to solve the above-described problems, the paper presents a key frame extraction method based on the quaternion Fourier transform with the multiple features fusion. Initially, the method extracts dynamic and static features that can characterize the color image information separately. Then, the quaternion matrix is used to perform the quaternion Fourier transform on the different features to obtain the fused phase spectrum of the Fourier transform. Immediately after, the information containing the overall structure of the image is filtered by Gaussian filtering, and the quaternion Fourier inverse transform is performed to obtain the fused feature map. Finally, an adaptive key frame filtering criterion, which uses the fused feature map to extract the key frames is constructed in this paper for accurately extracting key frames. Notably, experimental results indicate that the proposed method ensures the integrity of the image information while accurately capturing the global and local motion state of the target.

Introduction

With the development of the internet, network transmission, and information technology, the cost of information transmission is decreasing (Li, Yuan, Chen, Quaternion, Quaternion, & Decomposition, 2020). People are no longer satisfied with communicating through traditional text, voice, and pictures; instead, they prefer more vivid and intuitive expressions, such as video communication (Persia F, D'Auria D, Pilato G. An Overview of Video Surveillance Approaches. 2020 IEEE 14th International Conference on Semantic Computing (ICSC) 2020, 2020). While technological progress brings convenience to people, the problem of substantial growth in video data volume comes along, especially in video surveillance. Maintaining public safety as well as social stability remains a challenging task. With the popularity of surveillance equipment, video surveillance systems play an irreplaceable role in maintaining public safety and order. Therefore, quickly browsing the surveillance video and accurately identifying the insecurity factors become the top priority of the current task (Abdalla et al., 2019, Satya Krishna et al., 2022). Furthermore, due to the non-stop work throughout the day, video data traffic shows an exponential growth trend (Fei et al., 2021, Javier Traver et al., 2022). Simultaneously, surveillance videos captured by a single device can contain many redundant contents due to problems such as no outstanding structural features and a fixed background (Cedillo-Hernandez et al., 2021, Chandrakala et al., 2022). Therefore, the key frame extraction technique, which expresses the main content of the footage by extracting some frames in the surveillance video, has attracted extensive attention from researchers (Baek et al., 2021, Li and Zhou, 2021, Wang et al., 2013).

A key frame is a collection of images that reflects the core content of a video in a frame or sequence of frames (Huang and Wang, 2020, Li et al., 2018). Key frame extraction techniques are critically important in handling a large amount of redundant video data, improving the efficiency of video retrieval, and enhancing the precision of video retrieval. The existing key frame extraction techniques can be categorized into five types: (1) video key frame extraction based on shot boundaries (Wang, Guo, Wu, & Li, 2015), (2) video key frame extraction based on motion analysis (Zhu, Loy, & Gong, 2016), (3) video key frame extraction based on shot features (Lee & Grauman, 2015), (4) video key frame extraction based on clustering method (Zhang, 2020), (5) video key frame extraction based on deep learning(Wang, Bai, & Wu, 2021). Additionally, Yuan et al. proposed a new method of key frame extraction based on multi-feature adaptive threshold detection and then used deep learning to train the feature data to obtain gait sequences (Yuan, Xiao, & Li, 2015). Huang proposed a novel method based on the Phase Spectrum of Quaternion Fourier Transform (PQFT) for anomaly detection in the crowd (Wang et al., 2015). The method uses quaternion feature representation for Spatiotemporal anomaly detection, and then reconstructs the final anomaly saliency map using inverse QFT. The method can efficiently and robustly detect unusual events and satisfies real-time requirements. Guo et al. extended the image saliency detection method PQFT to the video field (Guo, Ma, & Zhang, 2008). They added motion features to quaternion features so that the method can be applied to videos. The added motion features are derived from the contrast difference between two related frames. Additionally, associated methods that use multiple feature fusion to perform target recognition on images have also achieved significant results (Huang and Jing, 2020, Mao et al., 2019, Zhang et al., 2019).

The existing key frame extraction methods usually operate on video sequences after grayscale preprocessing, which destroys the integrity of image information (Wang, 2020). In addition, current key frame extraction methods for surveillance video suffer from problems such as incomplete target detail extraction and inaccurate local action judgment (Zhang & Yang, 2014). To this end, this paper proposes a key frame extraction method for surveillance video based on quaternion Fourier transform with multiple feature fusion. Because the static features contain opposing color neuron features that represent the color information of the image, as well as the brightness features that respond to the subjective brightness perception of humans, the complete information of the image information is preserved. In addition, the spectrum in the quaternion Fourier transform contains more information about the image texture, making it possible to represent the local details of the target more clearly and unambiguously. Immediately after, the information containing the overall structure of the image is filtered out by Gaussian filtering, and the quaternion Fourier inverse transform is applied to it to obtain the fused feature map. Finally, this paper constructs an adaptive key frame filtering criterion to accurately extract key frames of surveillance video by characterizing the fusion feature map of two adjacent frames.

Section snippets

Quaternion

The Quaternion can also be called super-complex, an extended form of the complex number. Hamilton initially proposed Quaternion, and then Shoemake introduced this theory into computer graphics (Bill et al., 2021, Wang et al., 2019). A quaternion q contains four parts, which are a real part and three imaginary parts. The form of expression is reproduced below (Shi & Funt, 2007):q=a+bμ1+cμ2+dμ3

among them, a,b,c,d are real number, which represents the real part; μi,i=1,2,3 are three imaginary

Algorithm describe

This paper uses a quadratic Fourier algorithm to extract features from the whole image, that is, dynamic feature and static features, followed by image fusion of these features. The method proposed in this paper aims to extract the overall structure of the image while capturing the changes in image details and therefore requires the extraction and fusion of multi-angle features to obtain complete and transparent information on the saliency of the target. The image contour information can be

Experimental and results analysis

This paper conducts experiments on the key frame extraction method for surveillance video based on quaternion Fourier transform with multiple feature fusion from subjective and objective perspectives.

The experimental results indicate that the correctness and effectiveness of the method proposed in this paper is verified. This experiment uses AMD Ryzen 7 4800U with Radeon Graphics 1.80 GHz processor. The operating system is 64-bit Windows 10 Professional Edition. Since the proposed method in

Conclusion

In order to solve the problem that the existing key frame extraction method for surveillance video cannot guarantee the integrity of image information and cannot accurately extract the local motion state of the target, this paper proposes a key frame extraction method for surveillance video based on quaternion Fourier transform with multiple feature fusion. The method first extracts the dynamic and static features in the color surveillance video sequence. Then it obtains their fused phase

CRediT authorship contribution statement

Yunzuo Zhang: Conceptualization, Supervision, Project administration, Funding acquisition, Writing – review & editing. Jiayu Zhang: Conceptualization, Methodology, Investigation, Data curation, Writing – original draft, Writing – review & editing. Ruixue Liu: Software, Data curation, Validation. Pengfei Zhu: Conceptualization, Investigation, Writing – review & editing. Yameng Liu: Conceptualization, Resources, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is jointly supported by the National Natural Science Foundation of China (No.61702347, No.62027801), the Natural Science Foundation of Hebei Province (No.F2022210007, No.F2017210161), the Science and Technology Project of Hebei Education Department (No.ZD2022100, No.QN2017132), the Central Guidance on Local Science and Technology Development Fund (No.226Z0501G), the Shijiazhuang Tiedao University Graduate Innovation Funding Project (YC2022051).

References (39)

  • J. Ouyang et al.

    Robust hashing for image authentication using quaternion discrete fourier transform and log-polar transform

    Digital Signal Processing

    (2015)
  • L. Shi et al.

    Quaternion color texture segmentation

    Computer Vision and Image Understanding

    (2007)
  • C. Singh et al.

    Multi-channel versus quaternion orthogonal rotation invariant moments for color image representation

    Digital Signal Processing

    (2018)
  • K. Abdalla et al.

    Modelling perceptions on the evaluation of video summarization

    Expert Systems with Applications

    (2019)
  • N. Baek et al.

    Pedestrian gender recognition by style transfer of visible-light image to infrared-light image based on an attention-guided generative adversarial network

    Mathematics

    (2021)
  • J. Bill et al.

    Meta-heuristic optimization methods for quaternion-valued neural networks

    Mathematics

    (2021)
  • M. Cedillo-Hernandez et al.

    Improving DFT-based image watermarking using particle swarm optimization algorithm

    Mathematics

    (2021)
  • Chandrakala S, Deepak K, Vignesh L. Bag-of-Event-Models based embeddings for detecting anomalies in surveillance...
  • B. Chen et al.

    Full 4-D quaternion discrete fourier transform based watermarking for color images

    Digital Signal Processing

    (2014)
  • Y. Dong et al.

    Video key frame extraction based on scale and direction analysis

    The Journal of Engineering

    (2022)
  • Fei M, Jiang W, Mao W. Learning user interest with improved triplet deep ranking and web-image priors for topic-related...
  • S. Gao et al.

    Color constancy using double-opponency

    IEEE Trans Pattern Anal Mach Intell.

    (2015 Oct)
  • C. Guo et al.

    Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform

    IEEE Conference on Computer Vision and Pattern Recognition

    (2008)
  • Huang Z, Jing C. Super-Resolution Reconstruction Method of Remote Sensing Image Based on Multi-Feature Fusion. in IEEE...
  • C. Huang et al.

    A novel key-frames selection framework for comprehensive video summarization

    IEEE Transactions on Circuits and Systems for Video Technology

    (2020)
  • Javier Traver V, Damen D. Egocentric video summarisation via purpose-oriented frame scoring and selection. Expert...
  • Y. Lee et al.

    Predicting important objects for egocentric video summarization

    Int J Comput Vis

    (2015)
  • X. Li et al.

    Construction of network security situation indicator system for video private network

    Journal of Beijing University of Aeronautics and Astronautics

    (2020)
  • Li M, Yuan X, Chen H, Li J. Quaternion Discrete Fourier Transform-Based Color Image Watermarking Method Using...
  • Cited by (0)

    1

    ORCID: 0000-0001-7499-4835.

    View full text