A MV-Based Steganographic Algorithm for H.264/AVC without Distortion

: H.264/AVC video is one of the most popular multimedia and has been widely used as the carriers of video steganography. In this paper, a novel motion vector (MV) based steganographic algorithm is proposed for the H.264/AVC compressed video without distortion. Four modules are introduced to eliminate the distortion caused by the modifications of motion vectors and guarantee the security of the algorithm. In the embedding block, the motion vector space encoding is used to embed a ( 2n+ 1)- ary notational number into an n -dimension vector composed of motion vectors generated from the selection block. Scrambling is adopted to disturb the order of steganographic carriers to improve the randomness of the carrier before the operation of embedding. The re-motion compensation (re-MC) block will re-construct the macroblock (MB) whose motion vectors have been modified by embedding block. System block plays the role of the generator for chaotic sequences and encryptor for secret data. Experimental results demonstrate that our proposed algorithm can achieve high embedding capacity without stego video visual quality distortion, it also presents good undetectability for existing MV-based steganalysis feature. Performance comparisons with other existing algorithms are provided to demonstrate the superiority of the proposed algorithm.


Introduction
Data hiding utilizes the redundancy of human sense organs on digital signals to embed secret data into digital host media without arousing suspicion. The application of data hiding in the video domain can coarsely be categorized as watermarking, steganography, error recovery (resilient), and general data embedding [Tew and Wong (2013)]. Video steganography has attracted extensive attention due to the development of video coding standards and network transmission. Video steganography includes three categories intraembedding, pre-embedding and post-embedding [Liu, Liu, Wang et al. (2019)]. Intraembedding methods combining with certain aspects of the compression characteristics utilize the coding process to hide information, such as intra-prediction [Nie, Xu, Feng et al. (2018); Li, Meng, Xu et al. (2019)], MVs [Niu, Yang and Zhang (2017)], quantized discrete cosine transformation (QDCT) coefficients [Ma, Li, Tu et al. (2010) ;Chen, Wang, Wu et al. (2017) ;Zhang, Zhang, Yang et al. (2017) ]. Pre-embedding methods Mstafa et al. [Mstafa and Elleithy (2016) ;Sadek, Khalifa and Mostafa (2017)] view the video as a sequence of motion pictures and embed the secret data into frame pixels directly with a certain intensity. While the post-embedding methods Xu et al. [Xu and Wang (2011)] mainly focus on the compressed bitstream, the computational complexity is low. The modern video coding standards H.26x and MPEG-x both have a high compression ratio, the redundancy has been eliminated to a great extent and the stream structure is more complex. As a consequence, it is difficult to embed data into the compressed video stream while maintaining good visual quality. Motion vector usually has a wide range of value and is abundant in compressed video, it is considered to be ideal covert communication carriers by modifying the attributes or adjusting the motion estimation (ME) process. Early MV-based steganography selects a subset of MVs to be modified based on predefined rules. Jordan et al. [Jordan, Kuteter and Ebrahimi (1997)] first proposed using MVs to hide information by slightly compensating for parity. However, this algorithm ignores the influence of MV changes in the embedding process, resulting in a significant increase in the output bitrate of the payload video. Fang et al. [Fang and Chang (2006)] embedded the data into MVs by modulating the phase of MV. Xu et al. [Xu, Ping and Zhang (2006)] believed that the modification of MVs with small magnitude would result in deterioration in video compression efficiency. Hence the MVs with large magnitude are selected for data embedding. Aly [Aly (2011)] suggested tampering the MVs associated with high block prediction error by hiding one bit in each of their horizontal and vertical components. These algorithms have limited embedding efficiency and embedding capacity. To enhance the security and embedding efficiency of the algorithm, steganographic codes have been exploited to MV-based steganography. Hao et al. [Hao, Zhao and Zhong (2011)] proposed their steganography based on matrix encoding. Cao et al. [Cao, Zhao, Feng et al. (2011)] perturbed motion estimation was introduced by the wet paper code (WPC), and suboptimal MVs were selected for data embedding. Zhang et al. [Zhang, Cao and Zhao (2015)] proposed MV-based steganography with preserved local optimality. Yao et al. [Yao, Zhang, Yu et al. (2015)] defined a reasonable MV-distortion function by joining the spatial distortion change (SDC) and the prediction error change (PEC) together and then embedded data by two-layered Syndrome Trellis Code (STC). Yang et al. [Yang and Li (2018)] define motion vector space encoding combined with the original exploiting modification directions (EMD) for embedding data into H.265/HEVC videos. Zhai et al. [Zhai, Wang and Ren (2019)] first propose to embed in multi-domains for video steganography by combining partition modes (PMs) and MVs by multi-domain embedding (MDE) strategies. In this paper, we proposed an MV-based steganographic algorithm for the H.264/AVC, which includes four modules: system block, selection block, embed block and re-MC block. System block takes the private key to generate three chaotic sequences. The first is used for encrypting and decrypting the secret data. The second is provided to the selection block, whose main function is to select qualified steganographic carriers under the control of chaotic sequences. The last one is used for scrambling to improve the randomness of the carrier modification. We take the motion vector space coding to slightly and efficiently modify an n-dimensional vector to embed a (2n+1)-ary notational number. We introduced a re-MC block to re-reconstruct the MB whose MVs have been modified by the previous process. When compared with several related schemes, the proposed scheme not only holds excellent performance in visual distortion but also maintains a high embedding capacity. The rest of this paper is organized as follows. Preliminaries and notations were described in Section 2, including motion estimation and motion vector space encoding. Detail descriptions of proposed algorithm are given in Section 3. Comprehensive experiments are conducted to show the performance of our algorithm in Section 4. Section 5 gives concluding remarks with some future research directions.

Preliminaries and notations 2.1 Data partitioning and motion estimation
The hybrid encoding process of the H.264 incorporates many new features including intra-prediction in intra-frame, multiple frame reference, quarter-pixel interpolation, and de-blocking filtering. The H.264 standard supports MVs of different block sizes and finer sub-pixels, i.e., 1/4 pixel resolution for luminance component. The MB of the I-frame only supports two partitions modes 16×16 and 4×4. Each MB of inter-frame can be divided into 16×16, 16×8, 8×16 and 8×8 according to encoding parameters, and the MB with partition mode 8×8 can further divide into four modes 8×8, 8×4, 4×8 and 4×4. The motion estimation and compensation (MC) is a necessary procedure to reduce temporal redundancy. The schematic diagram of ME is shown in Fig. 1. The ME finds out the best-match block for the current block B within a search area RS in the reference frame R. We can take the full search (FS) algorithm to search the global bestmatch block by exhaustively testing all the candidate blocks. Nevertheless, the FS algorithm holds huge computational complexity. There are many fast block-matching algorithms, Three-step search, four-step search, and diamond search to reduce computational complexity and find out the local best-match block. Mean squared error, sum of absolute differences (SAD) and mean absolute difference (MAD) are usually considered as matching criteria to calculate the prediction error of target block B. Take SAD for example, ( , ) and ( , ) are luminance component pixel values of current block B and reference block , M and N are the width and height of B respectively. The main task of ME is to find a match block as local best-match block to meet minimum prediction error, = argmin ∈ ( , ), where is the set of all the candidate blocks within the search area RS in the reference frame. After this step, the prediction error . . }, ( , ) = ( , ) − ( , )} will suffer integer transform and quantization to get QDCT coefficients. When finding out the best-match block within the search area, we can get the MV = ( , ), where and are horizontal and vertical components of respectively. For the decoder, the predicted value of the current block can be acquired by from the reference frame. The final block is resumed by predicted value and its corresponding prediction error jointly. , where ∈ {1,2, . . . , } . Then, we can obtain a vector with N elements expressed as: For each element of , we can map to = ( ) from and obtain another -tuple = ( 1 , 2 , . . . , ), where (⋅) is defined as: = ( ) = ( (2 + 1) + 2 + 1) (2 + 1) (1) Now define an extract function as a weighted sum modulo (2 + 1): (2 + 1) (2) We can construct an -dimensional space based on and is the corresponding th dimensional coordinate of space. The space is denoted as and called motion vector space. Any vector can be mapped as a point in space and the value is defined as ( 1 , 2 , . . . , ), which is an integer within [0,2 ]. For any point = ( 1 , 2 , . . . , ) in , + = ( 1 , 2 , . . . , + , . . . , ) and − = ( 1 , 2 , . . . , − , . . . , ) are neighbor node of in the th dimension, where + = ( + 1) (2 + 1) and − = ( − 1) (2 + 1) . The neighbor node set with 2 elements denoted as There is a very important theorem that needs to be described in detail. Theorem For any point = ( 1 , 2 , . . . , ), the mapping value of and its all neighbor nodes are different from each other and the mapping value of these points can form a consecutive integer set of {0,1, . . . ,2N}. Proof. Assume the mapping value of = ( 1 , 2 , . . . , ) is a positive integer = ( ) = ( 1 , 2 , . . . , ) ≤ 2 . For any neighbor node + and − we can get + = ( + ) = ( + ) (2 + 1) and − = ( + ) = ( − ) (2 + 1). Finally, we can get a set as follow: The set consists of 2 + 1 elements, and these elements are one-to-one corresponding to the elements in integer set {0,1, . . . ,2 } after the values of and determined. ( ) (2 + 1) is the mapping value of , ( − ) (2 + 1) and ( + ) (2 + 1) are the mapping value of + and − respectively. ( − 1) (2 + 1) and ( + 1) (2 + 1) are the map value of + and − respectively. So the theorem is proved to be correct. Assume that there is a (2 + 1) -ary notational number and a point with coordinates. We want to modify at most one coordinate of to ′, which satisfies = ( ′). No modification is needed if equals ( ), ′ = . When ≠ ( ), calculate = ( − ( )) (2 + 1). If is no more than , increase the th coordinate value by 1, that alters to

Proposed scheme
In this section, we first generate three pseudo-random sequences using a chaotic system under the private key of the sender. The first pseudo-random sequence is used for the encryption of to-be-embedded data to ensure the security of secret data. The second one is used to determine whether the candidate MB is chosen as a steganographic carrier. The last and shuffle rule are combined to disturb the order of the steganographic carrier. Data embedding and extraction are given in subsections 3.2 and 3.3, respectively. The genetic structure of our proposed scheme for data hiding is shown in Fig. 2. Fig. 3 is the block diagram of the data extraction procedure.

Pretreatment
As shown in Fig. 2, the proposed algorithm includes four modules, the system block, the steganographic carrier selection block, the data embedding block, and the re-MC block.
In the system block, a chaotic system is used to generate three pseudo-random sequences S1, S2 and S3 under the private key K. For the chaotic system, we take the Logistic map, which is the most widely used nonlinear dynamic discrete chaotic map system and can be expressed as: (4) Any initial value of 0 ∈ (0,1] can generate a chaotic sequence 1 , 2 , … , , … . while ∈ (3.5699,4]. Chaotic systems exhibit different characteristics with different λ and finally reach a chaotic state. The chaotic sequence needs to be transformed into a binary sequence by the following formula. where = 1 ∑ =1 , m is the length of the chaotic sequence generated by Eq. (4), ∈ [0.5,1] is a weight. It should be noted that the parameters mentioned above include λ, 0 , and m are generated under the control of the secret key K. Thus, S1, S2 and S3 are finally generated by Eqs. (4)-(5). The stream encryption technology was used to encrypt the to-be-embedded data M to improve the security of data. The secret key of stream encryption S1 comes from the binary sequences generated by the chaotic system. We can simply get the encrypted binary sequence ′ through XOR (exclusive OR) operation ′( ) = ( )⨁ 1 ( ).

Data embedding
In our proposed algorithm, MVs were chosen as the steganographic carrier. When the current block is decoding in the decoder, the MV and the reference frame of the current block are combined for prediction. The predicted value ′ will not equal normal value if the MVs have been revised, and the block ′ + will produce dramatic changes in visual distortion, which is unacceptable for the steganographic algorithm. Therefore, the re-MC was introduced to re-acquire the real predicted value of ′ based on the modified MV at the encoder and obtain the corresponding residual ′ .
The main function of the selection block is to select the steganographic carrier under the control of the secret key S2. The MBs with the mode P8×8 are chosen as the candidates. It is based on the following three considerations for a trade-off between bit-rate, visual quality, and anti-steganalysis.
1) B frames are highly compressed in a bidirectional prediction manner, and within a high proportion in the compressed video stream. The modification of the MVs will cause a significant bit-rate variation.
2) There is a certain relationship between the mode and the texture complexity of the MB. MB with mode 16×16 is smoother than a 4×4 MB to a certain extent. The modification of MB with mode 16×16 will cause visual distortion easier.
3) The more steganographic carriers you select, the easier it will be for the steganalysis algorithm to detect them, so only the MB with mode P8×8 is selected. To make the selection of steganographic vectors random to prevent attacks from attackers. We introduce a pseudo-random sequence S2 to determine whether the candidate MB carries secret data or not. If the current bit of S2 is one, then the MB with mode P8×8 was chosen as the steganographic carrier. First, the MVs of the MB selected as the steganographic carrier are grouped into an MV vector . We can get the scrambled vector ′ using the Algorithm 1 under the pseudorandom sequence S3. Second, we modify ′ to ′′ to meet F( ′′ ) = , where d is a (2 + 1)-ary notational number from encrypted binary sequence ′, as shown in Algorithm 2. Finally, we restore the original order of the modified MV vector ′′ , the entire embedding process finished.

Data extraction
Data extraction is the inverse process of data embedding under 1 , 2 , 3 . The entire detailed process is presented in Fig. 3. Firstly, the MBs with mode P8×8 selected from the compressed video stream. According to the pseudo-random sequence 2 , it can be known that the current MB contains secret data or not. If contained, the MV vector is scrambled by Algorithm 1 under 3 . The process of extracting the secret data is straightforward, only by constructing τ and then obtaining its mapping value d. Finally, we can extract all mapping value and group into an encrypted stream ′, the embedded data stream M can be decrypted from ′ by ( ) = ′( )⨁ 1 ( ) with 1 .  Figure 3: The block diagram of the data extraction procedure For example, assume = {1,2,0,3}, 1 = 00101011, 3 = 01110110 and is a carrier that is considered to contain secret information by Selection Block. So we disturb it and got ′ = {2,0,1,3} under the secret key 3 . we can easily calculate the map value of ′ equals to = ( ′) = (1 × 2 + 3 × 1 + 4 × 3) 9 = 8 = 1000, then use the secret key 1 to decrypt and we get the result is 1000⨁ 0010 = 1010, which is same to .

Experiment results and analysis 4.1 Experimental setting
Our experiment is implemented based on the H.264 reference encoder software JM 19.0. Six standard video sequences (QCIF, YUV4:2:0, 176×144) are used for experiments. The first 100 frames in test video are encoded at 30 frames/s with an intra-period equals nine. In the experiment, we inserted different numbers of B frames into two P frames to carry out experiments as almost all literature do not take B frames into account during the experiment, while the compressed video stream contains vast of B frames. The number of inserted B frames is represented as Nbf, which ranges from 1 to 5 in our experiment. Besides, we also did a comparative experiment under different quantization parameters (QP) values 24, 28, 32 and discussed it. The QP value for P-Slice, B-Slice and I-Slice is the same; the rest of the encoder parameters retained their default values. To comprehensively evaluate the performance of our proposed scheme, we set up different experiments to analyze visual quality, embedding capacity, bit-rate variation, and antisteganalysis performance in the following subsections.

Visual quality
In this subsection, we introduce subjective and objective techniques to evaluate the visual quality of stego videos (H.264/AVC video with embedding data) and cover videos (standard H.264/AVC video without embedding data). Fig. 4 shows some decompressed frames on several stego video sequences under the condition Nbf=3, QP=28, all of them are P-frames. There are no significant visual distortions in these frames, subjectively, the proposed algorithm meets the requirement of imperceptibility.

Figure 4:
The visual quality of the 5th, 23th, 50th, 68th, 86th frames of stego video Objective technologies are also taken to evaluate visual quality. Tab. 1 gives the average structural similarity (SSIM) and peak signal to noise ratio (PSNR) value of cover video and stego video. The value of PSNR and SSIM "original" is calculated between the original and cover videos; The "marked" is calculated between original and stego videos.
It is straightforward to see that the difference between original and marked is insignificant both PSNR and SSIM. The average PSNR degradation no more than 0.003 dB, it is worth noting that the average PSNR increased slightly when Nbf=5. Nevertheless, the average SSIM between marked and original are indistinguishable. The SSIM and PSNR for each frame of marked and original depicted by Fig. 5, which shows that the SSIM and PSNR for each frame of marked remain highly consistent with original, means without conspicuous differences. So the visual distortion performance of the proposed algorithm is superior.

Embedding capacity
The result of the PSNR difference (DPSNR) and the maximum embedding capacity shown in Tab. 2. The worst average embedding capacity can achieve 5087 bits while QP=32 and Nbf=3, but the average DPSNR is only 0.0070 dB. The best average embedding capacity can achieve 20327 bits while QP=24 and Nbf=1, the average PSNR difference is only -0.0021 dB that means the quality of the video was hardly affected by data embedding. By comprehensively analyzing the data, the embedding capacity decrease along with the increment of QP and Nbf. The reason for this phenomenon is the number of the macroblock with mode P8×8 declined as the lower compressed video quality. Under the same QP conditions, different Nbf leads to different embedding capacity. The proposed algorithm shows good performance in terms of visual distortion and embedding capacity, and the embedding operation has almost no effect on the visual quality of stego videos.

Bitrate variation
is defined as following to evaluate the bitrate performance of our proposed algorithm further.
where is the bitrate of the stego video, and is the bitrate of the cover video. Tab. 3 shows that the average value with different QP and Nbf, the maximum average is no more than 0.538% and the minimum only 0.205%. As can be seen from Section 3.2, the data embedding process holds a high embedding efficiency, so the MBs that need to be reconstructed will also be reduced and the bit growth will be reasonable.

Performance comparison
In this subsection, we mainly compare the proposed algorithm with the other three algorithms. Zhu et al. [Zhu, Wang and Xu (2010)] proposed that the to-be-embedded data was embedded by modulating the best search point during the motion of the quarter pixel. Su et al. [Su, Zhang, Zhang et al. (2014)] introduce the diamond coding to achieve information hiding by a slight modulation of MVs. To show the advantages of our algorithm, here is a literature that uses a similar method to take some modification only on partly middle-and high-frequency nonzero QDCT coefficients [Chen, Wang, Wu et al. (2017)] instead of MVs. Due to each steganographic algorithm has its unique properties, so it is difficult to provide an accurate comparison. Thus, we collected some related experimental data include the difference of PSNR (DPSNR), Capacity, and BRI for test video some video sequence. From Tab. 4, we hold a significant superiority in the DPSNR indicator compared with Zhu et al. [Zhu, Xu and Wang (2010)], Su et al. [Su, Zhang, Zhang et al. (2014)] and Chen et al. [Chen, Wang, Wu et al. (2017)]. The embedding capacity of the proposed scheme is less than Zhu except for video News, but generally better than Su and Chen. Since the focus of Chen's is the bitrate growth control of stego video, all other algorithms include ours are less than it, but the BRI of proposed is better than Zhu's and Su's. By contrast, we can confidently conclude that our scheme not only holds an excellent PSNR performance but also has a low BRI and high embedding capacity performance.

Anti-steganalysis performance
In this subsection, some experiments have been conducted to evaluate the anti-steganalysis performance of our steganographic algorithm. There are many steganalysis methods against MV-based steganography, such as AoSo (adding or subtracting one) [Wang, Zhao and Wang (2014)] and NPELO (near-perfect estimation for local optimality) [Zhang, Cao and Zhao (2016)]. The GOP (group of picture) level feature extraction is performed on each compressed video by AoSo and NPELO. For each kind of feature, we set up an experiment to evaluate anti-steganalysis performance. 60% of cover-stego pairs are selected randomly for training the SVMs (support vector machine) with Gaussian kernel using the LibSVM toolbox [Chang and Lin (2011)]. For simplicity, we select 1/5 of the feature of cover-stego pairs for classification. Five-fold cross-validation (CV) was taken to determine the optimal penalization parameter from set {10 −3 , . . . , 10 5 } before training each SVM. We choose the 10 3 and 10 5 as the penalty parameter for AoSo and NPELO feature classifier respectively. Each experiment repeated 10 times, and the average detection accuracy regarded as the final performance evaluation. The result is shown in Tab. 5. The average detection accuracy of AoSo and NPELO against our algorithm are 54.33% and 64.45% respectively. If we consider CSM (cover source mismatch), the real detection accuracy would be negatively influenced and maybe as bad as random guessing. Therefore, we have reason to believe that our embedding scheme has a good performance on anti-steganalysis.

Conclusions
We have presented a novel MV-based steganographic algorithm to solve the visual distortion and capacity limitation for H.264/AVC. Four modules were introduced into the steganography to eliminate the distortion and ensure the security of the algorithm. The main idea of the paper is to use motion vector space coding to modulate the MVs to embed the secret information, and then re-reconstruct the MB to eliminate the distortion. As shown in the above experimental results, the steganographic algorithm can achieve good embedding capacity without visual distortion, the variation of bitrate for our algorithm also maintains a low-level increment, anti-steganalysis experiment results indicated that our algorithm has good performance against AoSo and NPELO feature. In our further study, we will focus on the robustness of our algorithm, as we can see that our algorithm is fragile from the diagram, which has the same problems as almost all current steganographic algorithms, not means watermarking algorithm.