A video watermark algorithm based on tensor decomposition

Since most of the previous video watermark algorithms regard a video as a series of consecutive images, the embedding and extraction of watermark are performed on these images, and the correlation and redundancy among frames of a video are not considered. Such algorithms are weak in protecting against frame attacks. In order to improve the robustness, we take into consideration the correlation and redundancy among the frames of a video to propose a blind video watermark algorithm based on tensor decomposition. First, a grayscale video is represented as a 3-order tensor, and the core tensor is obtained by tensor decomposition. Second, the watermark embedding position is selected based on the stability of the maximum value in the core tensor because the core tensor represents the main energy of a video. Then, the watermark is embedded by quantifying the maximum value in the core tensor. Finally, the watermark is uniformly distributed across frames of a video by inverse tensor decomposition. The experiments show that our algorithm based on tensor decomposition has better imperceptibility and robustness against common video attacks.


Introduction
Copyright protection is more and more important as digital videos become popular. Video watermark technology is the digital watermark technology with video being the carrier. It embeds confidential information into carrier based on the video's redundancy. Video watermark technology is invisible, able to resist malicious attacks, and has achieved video copyright protection or video content authentication [1].
Video watermark technology is classified into spatial domain watermark technology and transform domain watermark technology. Kalker [2] introduced spread spectrum into video watermark and proposed a classic video watermark algorithm for video broadcast monitoring. The algorithm regards a video as a sequence of images and embeds the watermark into the video frames in the spatial domain. The algorithm works well with broadcast transmission signal processing and produces a detection process with low complexity. But it is not robust against common attacks. The spatial video watermark algorithm proposed by Hartung [3] converts the original video image into a one-dimensional signal and modulates the watermark into a pseudo-random sequence. This watermark is embedded into the one-dimensional signal. This classical spatial domain algorithm has disadvantages in robustness against attacks such as video compression and filtering. Karybali [4] proposed an effective spatial watermark algorithm to improve robustness by perceptual masking and watermark blind extraction.
A spatial watermark algorithm directly modifies pixels of image in spatial domain. The advantages are good transparency and low complexity, whereas one main disadvantage is watermark loss after image compression or geometric attack [5][6][7]. A transform domain watermark algorithm transforms an image into a domain by Discrete Wavelet Transform (DWT), Discrete Fourier Transform (DFT), or Discrete Cosine Transform (DCT), and so forth. Watermark embedding is performed after the transform. The transform domain watermark technology is robust against attacks such as filtering and image compression. In 1995, Koch proposed a watermark algorithm in DCT domain, which is robust against compression and filtering attacks [8]. In 1997, Cox [9] summarized and analyzed the existing transform domain watermark algorithms and proposed to embed watermark into the low frequency coefficients of an image. The robustness of the watermark was effectively improved. In order to further improve anti-attack capability, Chandra [10] first used Singular Value Decomposition (SVD) for digital watermark in 2001, and embedded watermark image into singular value of the carrier image. The digital watermark algorithms with SVD can improve anti-attach capability, especially against geometric attacks.
Most video watermark algorithms so far regard a video as a series of consecutive images and embed watermark into the images. However, the correlation and redundancy among the frames of a video are not considered. They are not robust against frame attacks such as frame addition, frame deletion, or frame averaging.
We introduce tensor into video watermark in order to solve the above problems. A tensor is a multi-dimensional array and has advantages in representing multi-dimensional data. Tensor computation has been successfully used in face recognition [11], visual tracking [12] and action classification [13]. Tensor decomposition [14][15][16] has become a landmark in video-related research. However, published studies on tensor-based video watermark are rare [17,18]. In [17]，Abdallah proposed a tensor-based video watermarking algorithm, but it is non-blind. Recently, Xu [18] represented a color image as a third-order tensor, and proposed a robust watermark algorithm, but that is only suitable for color image. In this study, we represented a grayscale video as a 3-order tensor and proposed a blind video watermark algorithm based on tensor decomposition. The flow chart of our algorithm is shown in Figure 1. The core tensor is obtained by Tucker decomposition of the 3-order tensor. The watermark is embedded by parity quantization [19] of the maximum value of core tensor. The modified core tensor is uniformly distributed across frames of a video by inverse Tucker decomposition. The watermark extraction is simply the inverse process of embedding. Experiments show that our algorithm is robust against common video attacks and imperceptible.
The main contributions of this study are as follows: (1) We introduce tensor into video watermark. The correlation and redundancy among the frames of a video are considered to enhance the robustness against frame attacks.
(2) The stability of the algorithm is guaranteed because Tucker decomposition is reversible, and the core tensor represents the main energy of the original video and is relatively invariant.
(3) The modified core tensor is uniformly distributed among the frames of a video by inverse Tucker decomposition, so that the video quality and the imperceptibility of watermark are guaranteed.
This paper is organized as follows. The basics of tensor are introduced in Section 2, our watermark embedding and extraction algorithm is described in Section 3, and the experiment results are shown in Section 4.

Tensor
The next notations are used. Scalar: (a, b, etc.); Vector: (a, b, etc.); Matrix: (A, B, etc.); High-order tensor: ( ). A tensor is a high-order matrix, an extended form of a matrix toward the higher dimension. Vector and matrix are first-order tensor and second-order tensor, respectively. A N-order tensor is defined as . A video sequence can be regarded as a 3-order tensor, and its three dimensions are the width, height and length of the video.

Tensor unfolding
Tensor has many advantages in representing multi-dimension data. For example, if a video is considered as a tensor, the properties of the original video can be preserved to the maximum extent. However, high-order tensor results in higher-level computation. So, tensor is usually unfolded to matrices for easy computation [20].
is N-order tensor , tensor is unfolded into matrices , . The unfolding of a 3-order tensor is shown in Figure 2.

The mode-n product of tensor and matrix
The mode-n product of a N-order tensor and a matrix is noted as ，where , the entries are given by: The mode-1 product of a 3-order tensor and a matrix is shown in Figure 3.

Tensor decomposition
Tensor decomposition is generalization of matrix singular value decomposition in high dimensions [21]. An image F with size can be decomposed by SVD as follows: where U and V are the left and right singular matrices of F, respectively. S is the diagonal matrix composed of the singular values of F. A matrix is regarded as a 2-order tensor. According to the definition of mode product of tensor and matrix, the singular value decomposition SVD of a matrix can be represented as a 2-order tensor S mode product of a matrix and a matrix sequentially: with For a tensor of high-order (k>2), CP(CANDECOMP/PARAFAC)decomposition [22] and Tucker decomposition [23] are often used. CP decomposes a tensor into a finite sum of rank-1 tensors. CP guarantees the uniqueness of the decomposition result, but its rank solution is an NP problem. Tensor is decomposed into the mode product of the core tensor and factor matrices by Tucker decomposition. The core tensor contains the main information of the original tensor. Tucker decomposition is used in our study. High-Order Singular Value Decomposition (HOSVD) is the classic algorithm for Tucker decomposition. Given a tensor with size , Tucker decomposition with HOSVD [24] is as follows: [ ] (8) [ ] (9) where are the unfolding matrices of in three directions, respectively. And the core tensor writes: Tucker decomposition of a 3-order tensor is shown in Figure 4, where ， ， . Figure 4. Tucker decomposition of a 3-order tensor.

A watermark algorithm based on tensor decomposition
In our watermark algorithm, a video is initially represented as a tensor, which contains the relevance and redundancy among the frames of a video. Then, a core tensor is obtained by Tucker decomposition. Finally, a watermark is embedded into the core tensor by parity quantization.

The process of watermark embedding
The resolution of a video V is , and the size of a watermark B is . To make full use of the relevance and redundancy among the frames of a video, K frames of a grayscale video are grouped as a 3-order tensor. The size of tensor is .
The core tensor and 3 factor matrices ， ， are obtained through Tucker decomposition with HOSVD. The process of watermark embedding is as follows.
(1) Arnold Scrambling. In order to eliminate spatial correlation among the binary watermark pixels, B becomes through Arnold transformation. It can be defined by the following equation: [ ] (11) where is the coordinate of the original watermark pixel, is the transformed coordinate of with Arnold, and m is the width of the matrix. In the experiment, a=1 and b=1. The Arnold transformation is performed for t times on the original watermark, and t is saved as a key for watermark extraction.
(2) Tucker decomposition with HOSVD. Tucker decomposition is performed for each to obtain the core tensor .
(3) Quantification and modification of the core tensor. Parity quantization is used to embed watermark into the core tensor. For each tensor ， is the maximum value of the core tensor ，and is donated as .
a. Quantify the maximum value of each core tensor, denoted as , where Q is the quantization intensity, and the value of Q is discussed below. b. The maximum value of each core tensor is modified to embed watermark.
[ ] (13) (4) Reconstruction of the watermarked video. The watermarked video is reconstructed by inverse Tucker decomposition with the modified core tensor .

The process of extracting watermark
Watermark extraction is the inverse process of watermark embedding. The specific steps of watermark extraction are as follows.
(1) Tucker decomposition is performed on each watermarked video tensor to obtain the core tensor .
[ ] (15) (2) The extracted watermark is determined according to the maximum value of the core tensor .
a. Quantify the maximum value of each core tensor , denoted as .
b. Determine the extracted information according to the parity of . is 1 when is even; is 0 when is odd. (3) Perform inverse Arnold transformation on to obtain the original watermark B.

Metrics
The follow metrics are used to measure the robustness and imperceptibility of our video watermark algorithm based on tensor decomposition. The imperceptibility of the watermark is evaluated with Peak Signal to Noise Ration (PSNR) and Mean Square Error (MSE).
[ ] (16) where M and N are the height and width of a single-frame image, I and are the original video frame and watermarked video frame. The smaller the MSE value, the smaller the difference between the single-frame watermark image and the original image is. PSNR is calculated by MSE as follow: A smaller PSNR means that the distortion of the watermarked frame is more serious. In addition, the bit error rate (BER) and normalized correlation coefficient (NC) are used to evaluate the robustness of the watermark. The equations are as follows: [ ] (19) where m is the size of the watermark, B and B' are the original watermark and extracted watermark, respectively. The robustness of watermark increases as NC increases.

Experiment parameters
The size of the test video is 352 × 640 and there are 2268 frames in total. The size of the watermark is 18 × 18. A bit of watermark is embedded into a group of K frames. The size of the tensor is 352 × 640 × K, K being 7 in our experiment. The number of scrambling t is 15. The relationship between quantization strength Q and watermark BER is shown in Figure 5. BER decreases as Q increases. The watermark is correctly extracted when . Q is set to 2000 in order to ensure the robustness of the algorithm and video quality. The PSNR of the first 100 frames of a video when Q = 2000 is shown in Figure 6. The PSNR of the watermarked video is over 40dB. The examples of our algorithm are shown in Figure 7.

Robustness test
Different attacks are used to testify the robustness of our watermark algorithm, including frame swapping, zooming, cropping, filtering, noising, and black-border filling. Our tensor-based watermark algorithm is robust against frame attacks, zooming, rotation, cropping in the experiments. The NC of the extracted watermark through the frames swapping in a group is shown in Table 1. The NC of the extracted watermark remains high even if about 50% frames in a group are swapped. The modified maximum value of the core tensor has uniform effect on each frame, because every element in the core tensor takes part in the mode product of the factor matrices by inverse Tucker decomposition. The relation between the video's zooming and NC of the extracted watermark is shown in Figure 8. Watermark is extracted irrespective of a video being zoomed in or out as long as the video is restored at the same resolution. The NC of the extracted watermark is as high as 0.8098 even when the video is zoom out to 0.1 of its original size. Watermark is extracted correctly when black-border is filled around a borderless video because the core tensor represents the main energy of a video and the energy from these zero-valued pixels in a video is tiny. The result is shown in Figure 9. The watermark is also extracted correctly as long as the direction of the video remains unchanged. The relationship between the rotation angle and the maximum value of the core tensor is shown in Figure 10. The maximum value of the core tensor changes periodically as the rotation angle changes. The watermark is extracted correctly from the watermarked video after rotation correction [25,26]. Some experiment results for rotation attack are shown in Table 2.  The experiment results of black-border video by our algorithm are shown in Table 3. Watermark is extracted correctly irrespective of the cropping being in the up or down, left or right side when the cropped part is included in a black-boarder because the contribution on the maximum value of the core tensor from black-border (0-value pixels) is very small. The watermark extraction of filtering and noise attacks is shown in Table 4.

Conclusions
A grayscale video is represented as a 3-order tensor in order to make full use of the relevance and redundancy among the frames of a video. The core tensor is obtained by Tucker decomposition, and the embedding and extraction of the video watermark is achieved using parity quantization of the maximum value of the core tensor. Watermark information is uniformly distributed across the frames of a video because of the reversibility and stability of Tucker decomposition, so that the video quality and the imperceptibility of watermark are guaranteed. It is robust against various video attacks, especially frame attacks. In our algorithm, only grayscale videos are used, color video watermarking method based on the tensor domain will be studied in the future.