Video reconstruction based on Intrinsic Tensor Sparsity model

https://doi.org/10.1016/j.image.2018.11.010Get rights and content

Highlights

  • We propose a tensor sparsity based reconstruction framework for video CS recovery exploiting the nonlocal structured sparsity via sparsity tensor approximation.

  • We propose Gaussian Joint Sparsity (GJS) model to reconstruct the initial video sequence by employing the frame-to-frame similarity.

  • An efficient ADMM algorithm is designed to solve the reconstruction problem based on ITS. What is more, the large matrix inverse problem is simplified by the block CS when solving the video signal with fixed sparse tensor.

Abstract

The natural images have self-similarities which can be used to improve the image reconstruction. However, the existing video reconstruction algorithms pay more attention to modeling and ignore the importance of priors in the reconstruction. In this paper, the self-similarities are involved in the modeling when the video is reconstructed from temporally compressed video measurements. The proposed reconstruction model includes two parts: First, the video tensor sparsity model is formulated by using a spatial–temporal tensor sparse penalty for similar patches. The Intrinsic Tensor Sparsity (ITS) measure is used as the sparsity measure, which encodes both sparsity insights delivered by the Tucker and CANDECOMP/PARAFAC (CP) decomposition for tensors. Second, 3D video patches are modeled as the Gaussian Joint Sparsity (GJS) by exploiting the temporal similarity to obtain an initial image which has distinct direction structure. GJS is a combination of statistical distribution and joint sparsity model. The experimental results show that both the reconstruction models based on ITS and GJS contribute to improving the quality of the video reconstruction.

Introduction

Compressive Sensing (CS) breaks through the Nyquist sampling limit and has brought the revolutionary change for the data acquisition technology [1], [2], such as the fields of compressive imaging systems and cameras. One of the representative imaging systems is the Single Pixel Camera (SPC) by Rice University [3]. With the advent of SPC, imaging system has been transformed drastically. For video sampling, the CS is used to trade off the spatial and temporal resolution of the cameras. And there are many video sampling methods designed [4], [5], [6] to capture videos with both high spatial and temporal resolutions. An effective sampling method is the Coded Aperture Compressive Temporal Imaging (CACTI) [6] which reduces the pressure of the bandwidth with low implementation complexity.

With the CS theory, the original signal can be reconstructed from the CS measurement. For sparse signal xRN, through the measurement matrix ΦRM×N, MN, we obtain the estimation of the original signal by the following optimization problem x˜=arg minxx0s.t.Φx=y,which can be written as x˜=arg minxΦxy+λx0.where 2 is the l2-norm, x2=x12+x22++xn2. 0 is the l0-norm that is a total number of non-zero elements in a vector.

Various methods have been proposed to solve the l0-norm optimization problem. Such as the evolutionary algorithms [7], the greedy algorithms [8], [9], [10], [11], [12]. The item x0, a ‘prior item’, is a sparse constraint to the signal. The l0-norm optimization problem is NP-hard. To address the problem, l1-norm minimization problem is proposed as it gives same results as l0-norm under certain circumstances [13], and can be solved through analytical solvers, for instance, the iterative shrinkage algorithms [14], [15] and Basis Pursuit (BP) [16].

In addition to all these sparse models mentioned above, the Statistical Compressed Sensing (SCS) with Gaussian Mixture Models (GMMs) which works with general Bayesian models is proposed in recent years [17]. GMMs which describes most of the real signal very well has been applied to solve various image processing problems, such as classification [18], denoising [19], CS reconstruction [20], [21]. Based on GMMs, we proposed Gaussian Joint Sparsity model to capture the temporal similarity.

As people pay more attention to the video reconstruction, a serious of algorithms are proposed to reconstruct the video sequence, such as Generalized Alternating Projection (GAP) [22], GMM-based algorithm [21], etc. The GMM-based algorithm is proposed by Yang et al. and applies GMM to model spatial–temporal 3D video patches successfully, yet it ignores the similarity between the spatial and temporal of the video.

It is well known that the still images have geometric self-similarities. This is especially true for the video sequences. A multitude of still image restoration methods improve the reconstruction by using the spatial similarity [23], [24], [25], [26]. In this paper, the reconstruction model based on Intrinsic Tensor Sparsity (ITS) and Gaussian Joint Sparsity (GJS) model is proposed to exploit both the Spatial and temporal similarity of the video sequence. The innovations are briefly described as follows: (1) We propose a tensor sparsity based reconstruction framework for video CS recovery exploiting the nonlocal structured sparsity via sparsity tensor approximation. In this model, the 2D similar patches are searched in the spatial–temporal domain, and the ITS is used as the tensor sparsity measure, fully taking advantage of the redundancy. (2) We propose Gaussian Joint Sparsity (GJS) model to reconstruct the initial video sequence by employing the frame-to-frame similarity. In this model, the 2D image blocks which in the same position of the adjacent frames are assumed to have the same structure and obey the same Gaussian distribution. (3) An efficient ADMM algorithm is designed to solve the reconstruction problem based on ITS. What is more, the large matrix inverse problem is simplified by the block CS when solving the video signal with fixed sparse tensor. When reconstructing the video, a reliable initialization video sequence is obtained by GJS, then the video reconstruction model based on ITS is adopted to improve the reconstruction results. The two models work together and improve the reconstruction results effectively

The outline of the rest of the paper is as follows. Section 2 talks about some notions and related work. Section 3 describe the video reconstruction model based on ITS and the initialization method based on GJS. Section 4 reports the experimental results. At last, conclusions are discussed in Section 5.

Section snippets

Notions and related work

This work involves tensor, ITS measure, and CACTI. Next, we will review these three categories of related works.

Video reconstruction based on ITS and GJS

The natural images have self-similarities which can be used to improve the image reconstruction result. There are three types of similarity in videos sequence:

  • 1.

    For one frame of the video, an image block can find its similar blocks in the same frame (see the upper-right boxes of Fig. 3). There is spatial redundancy in each frame of the videos.

  • 2.

    For the static scenes in the video, if a scene occurs at one frame, the same scene will occur in the same position in the adjacent frames (see the

Experimental results

In this section, four comparison methods are used for analyzing, one is the GMM proposed in paper [21], the other three are the methods which combine with the proposed.

GMM: The algorithm proposed in paper [21].

GJS_PLE: It is the proposed initialization method in Section 3.2. The details can be seen in Algorithm 2.

GMM_ITS: GMM is used to obtain an initial video sequence. Then the proposed video reconstruction model based on ITS in Algorithm 1 is used to improve the video sequences.

GJS_PLE_ITS:

Conclusion

Our paper proposes a video reconstruction algorithm base on ITS and GJS with the CACTI measurement. By transforming the video reconstruction problem to a tensor sparsity approximation problem, the proposed algorithm enjoys the following advantages: (i) The tensor sparsity model adequately captures the self-similarity of video by using a spatial–temporal tensor sparsity penalty. The ITS measure which finely encodes the correlation insights under the known Tucker and CP decomposition for tensors

Acknowledgments

This work was supported in part by the State Key Program of National Natural Science of China (No. 61836009), the National Natural Science Foundation of China (No. 61871310, No. 61573267, No. 61771376, , No. 61801350, No. 61876220) in part by the Equipment Pre Research Field Foundation of China (No. 61403120101), in part by the Program for Cheung Kong Scholars and Innovative Research Team in University, China (No. IRT_15R53), in part by The Fund for Foreign Scholars in University Research and

References (33)

  • MallatS.e.G. et al.

    Matching pursuits with time-frequency dictionaries

    IEEE Trans. Signal Process.

    (1993)
  • TroppJ.A. et al.

    Signal recovery from random measurements via orthogonal matching pursuit

    IEEE Trans. Inform. Theory

    (2007)
  • DonohoD.L. et al.

    Sparse solution of underdetermined systems of linear equations by stagewise orthogonal matching pursuit

    IEEE Trans. Inform. Theory

    (2012)
  • HintonG. et al.

    Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups

    IEEE Signal Process. Mag.

    (2012)
  • DonohoD.L.

    For most large underdetermined systems of linear equations the minimal l(1)-norm solution is also the sparsest solution

    Comm. Pure Appl. Math.

    (2006)
  • BlumensathT. et al.

    Iterative hard thresholding for compressed sensing

    Appl. Comput. Harmon. Anal.

    (2008)
  • Cited by (3)

    • Tensor-based plenoptic image denoising by integrating super-resolution

      2022, Signal Processing: Image Communication
      Citation Excerpt :

      Besides, to avoid the vectorization of image, some works for hyper/multi-spectral image (HSI/MSI) denoising introduce the tensor representation with the higher order singular value decomposition (HOSVD) [40], low rank tensor approximation (LRTA) framework [41], Laplacian Scale Mixture modeling [42], Hyper-Laplacian regularization [43], tensor dictionary learning (TDL) [44], TenSR [45,46] and intrinsic tensor sparsity regularization (ITSReg) [47] to keep the intrinsic structure of HSI/MSI data to improve the denoising performance. In addition, a spatial–temporal tensor sparse penalty for similar patches is introduced in a video tensor sparsity model for video reconstruction [48] and an effective low-rank tensor completion method is used to address the color video recovery problem [49]. A tensor-based optimization algorithm with the nuclear norm regularization term is utilized for the 4D computed tomography (4D-CT) super-resolution (SR) reconstruction [50].

    • High dimensional data reconstruction based on L<inf>2,1</inf> norm

      2021, Applied Mathematical Modelling
      Citation Excerpt :

      Finally, the processing speeds of the three methods are also studied. Table 2 presents the processing speeds of the model in [3], model in [5] and proposed method. From Table 2, it can see that the model in [3] needs the most time than the model [3] and our method.

    View full text