Novel Side Information Generation Algorithm of Multiview Distributed Video Coding for Multimedia Sensor Networks

The traditional multiview distributed video coding scheme using regional unified coding may lead to distortion problem of decoding estimation of the intense motion region. This paper presents a novel multiview distributed video coding algorithm. In the main perspective, gaining the intense motion regions of Wyner-Ziv frame according to the criteria of ROI, the algorithm extracts their DCT low-frequency coefficients for the entropy coding, in order to generate the best temporal side information. For the nonintense motion regions, the algorithm utilizes motion compensation interpolation (MCI) to generate side information. Finally, side information based on fusion of temporal and spatial side information will be gained. Experimental results show that our proposed algorithm can gain more accurate motion estimation in the intense motion region. Quality of decoded image is improved with the same transmit rate; thus, energy consumption of sensor nodes will be decreased ultimately.


Introduction
In recent years, along with the rapid development of the wireless multimedia communication technology [1], digital video requirements are being increased.People would like to see natural characterization of objects clearer and more realistic: the traditional single-view video network can only provide two-dimensional visual, and three-dimensional visual senses cannot be provided better; so the multiview video network appears.In the multiview video network, which has the limits of low power, storage capacity, computational and communication ability, it does not only need low-complexity encoding, but also requires real-time video encoding and transmission.Traditional video coding standards, such as MPEG-x or H.26x, mainly rely on the hybrid architecture, encoder using motion estimation to fully exploit the video sequences of time, and spatial correlation information.Since the heavy computing burden of the motion estimation and compensation task in these video compression standards, the encoder is 5 to 10 times more complex than the decoder [2,3].Traditional video coding system is not suitable, novel coding methods are required.Wide attention has been focused on new video codec framework, distributed video coding (DVC) from scholars, which uses intraframe encoding and interframe decoding.Decoder explores the correlation of video signals for interframe prediction decoding; so DVC removes the complexity of the interframe prediction coding in encoder.Distributed Video Coding, which has the characteristics of low-complexity encoding and good robustness, can meet the needs of these new video applications very well.
There are some DVC frameworks that have been proposed, such as Girod and Aaron's Wyner-Ziv video coding [4,5], PRISM (power-efficient robust high-compression syndrome-based multimedia) [6], Zixiang Xiong's layered DVC [7], Sehgal's state-free DVC [8], Wavelet-based DVC [9] and Multiview Distributed Video Coding [10][11][12].Literature [10][11][12] proposed DVC algorithm based on turbo or LDPC, the Wyner-Ziv frame of Distributed Video Coding in all regions had not considered multiview [13,14], motion International Journal of Distributed Sensor Networks estimation techniques cannot accurately predict the area where there is more intense exercise.Then, the decoder cannot accurately generate the temporal side information (temporal SI).And the side information is merged by the temporal side information (temporal SI) and spatial side information (spatial SI).Therefore, decoder needs to request more feedback information, thus not only increases the rate, but the decoded portion of the image is still not accurate enough.Focus on this problem, an improved multiview distributed video coding algorithm is proposed in this paper.In the main perspective, we can get the intense motion region and the nonintense motion region according to the criteria of ROI.For the intense motion regions, the algorithm extracts their DCT low-frequency coefficients for the entropy coding.Decoder uses decoded DCT low-frequency coefficients for bidirectional hash motion estimation which enables the decoder to choose between past and future reference frames for frame interpolation to obtain the best temporal side information.For the nonintense motion regions, the algorithm utilizes motion compensation interpolation (MCI) to generate temporal side information.Finally, side information based on fusion of temporal and spatial side information can be gained.Simulation results show that this algorithm can enhance the efficiency of the intense motion region, thereby reducing the bit rate while improving the quality of decoded images, energy consumption of sensor nodes can be reduced ultimately.
The rest of the paper is organized as follows.Section 2 introduces the basic principles of multiview DVC.Section 3 proposes the Multiview DVC framework based on DCT hash in detail.In Section 4, the experimental results are given.Finally, Section 5 is the conclusion.

Multiview Distributed Video Coding System
The goal of distributed source coding (DSC) is that the complexity of the encoding side is transferred to the decoder in order to achieve efficient compression.In this particular field of multiview distributed video coding (MDVC), the side information is generated from the camera of the internal and external camera interpolation.Multiview video systems tend to produce large amounts of data which have strong correlation to eliminate redundant information in the data, how to improve the compression ratio is one of the key problem for MDVC, however power consumption of the camera sensor nodes are subject to this problem, therefore, we need a low-complexity encoder in order to avoid communication between the complex internal nodes.
Figure 1 depicts a multiview DVC system architecture [10][11][12], two views are adopted and assume that they are still.The camera of the first view as intracamera works in the traditional way, these video streams are independent of the other camera coding.The camera of the second view called Wyner-Ziv camera-independent encode, but utilizes the other video stream to decode.The frames of the first view are encoded using traditional intracoding, the second view utilizes DVC.The key frames (K) in the second view are encoded and decoded using a conventional intraframe codec.Between the key frames are Wyner-Ziv frames which are intra frame-encoded but interframe-decoded.Encoder applies a 8 × 8 DCT transform on W frame, then the coefficients are quantized using a uniform scalar quantizer.The Slepian-Wolf coder is implemented using a low-density parity-check code (LDPC).The parity bits produced by the LDPC encoder are stored in a buffer, which transmits a subset of these parity bits to the decoder upon request.If the decoder cannot reliably decode the bits, additional parity bits are requested from the encoder buffer through feedback.The request-anddecode process is repeated until an acceptable probability of bit error is guaranteed.The decoded bits are reconstructed as DCT coefficients, and are the finally generated by taking the inverse-DCT and inverse-quantization of the reconstructed DCT coefficients.The side information of multiview is fusion by temporal and spatial side information, so the side information is more accurate.In past research, Slepian-Wolf codec usually adopts Turbo code for error correction coding.While LDPC codes with its excellent performance, simple form and good prospects are gaining increasing interest recently.

Multiview DVC Framework Based on DCT Hash.
Based on [10][11][12], the Wyner-Ziv frames (W frame) of multiview distributed video coding in all regions without distinction, motion estimation techniques cannot accurately predict the area which are more intense exercise.Then, the decoder cannot accurately generate the temporal side information (temporal SI).For this problem, this paper presents an improved multiperspective distributed encoding algorithm.In the main perspective, we select intense motion macroblock as the ROI macroblock, the low-frequency DCT coefficients of the ROI macroblock are selected to help the side information creation at the decoder and then improve the coding efficiency and the decoded image quality.Figure 2 illustrates the multiview distributed video coding (MDVC) framework based on the DCT hash, the first view uses conventional intracoding scheme, the decoded video stream is transformed into the spatial side information for W frame of the second view by homograph.The K frames of the second view are encoded and decoded by conventional intraframe scheme, while for W frames of the second view, using the scheme of the combination of LDPC coding and entropy coding.At the encoder, according to ROI discrimination algorithm, W frame is divided into ROI macroblock (8 × 8 macroblock) and non-ROI macroblock (8 × 8 macroblock).Then, for the ROI macroblock, the low-frequency DCT coefficients are selected as DCT hash, which uses the entropy coding.The residual of ROI macroblock and non-ROI macroblock are encoded and decoded by LDPC coding.At the decoder, if a DCT hash is available to guide the motion estimation process, bidirectional hash-based interpolation is performed; otherwise, MCI is used.Then, the temporal side information and spatial side information are fused to generate the best side information.Side information is used in the reconstruction to obtain the decoded DCT coefficients; finally, IDCT and IQ (inverse quantization) are applied to generate the W decoded frame W .

ROI Macroblock Selection and Temporal Side Information
Generation [13].Similar to our previous work [13], in hashbased motion estimation, which is described previously, the hash bits of W frame are sent for all macroblocks to assist the decoder to generate the side information.However, motion vector of many macroblocks is equal to zero or very small in most video sequences; so only little part of blocks has large displacement.For many macroblocks, MCI can make a good estimation.So, it is not necessary to send DCT hash for these macroblocks.Therefore, encoder uses SAD criteria to distinguish ROI, then get ROI and non-ROI macroblock.Similar to our previous work [13], current frame is X w , and the previous reference frame is X P ; so SAD criteria can be obtained from the following equation: where B i represents each macroblock, and (x, y) is pixel coordinates inside 8 × 8 macroblock.If SAD ≥ T, we get ROI macroblock, otherwise, non-ROI macroblock.An adequate threshold T has been found experimentally.

International Journal of Distributed Sensor Networks
As our previous work [13], the temporal side information B R of ROI macroblock B R is generated by bidirectional hash motion estimation: B R = mv 2 x, y × B P x + mvx 1 x, y , y + mv y 1 x, y mv 1 x, y + mv 2 x, y where B P , B F are the best matching macroblocks of ROI macroblock B R in the past and future reference frames K P , K F ; mv 1 (x, y) is the motion vector of B R relative to B P , mv 2 (x, y) is the motion vector of B R relative to B F ; mvx(x, y) represents the macroblock motion vector in the x direction, mv y(x, y) is the macroblock motion vector in the y direction.

Spatial Side Information Generation.
Adjacent video sensor node monitors the same target scene from different locations and different angles; because of this multi-angle correlation, we can use the video sequence of the adjacent video sensor nodes at the same moment to generate spatial side information by homograph.We assume that all nodes are time synchronized in the wireless multimedia sensor networks.Since the position and perspective of the video sensor node is fixed, so only a homographic matrix transformation is needed.The homograph is a 3 × 3 matrix that relates video sensor node V 1 to another one V 2 in the homogenous coordinates system.As in [3,15], each point is from V 1 (x 1 , y 1 ) is mapped to a point (x 2 , y 2 ) of V 2 up to a scale µ such that where a 0 , a 1 , . . ., a 7 are 8 sport parameters.When a 6 = a 7 = 0, the model is an affine geometry transformation; a 0 = a 4 = 1, a 1 = a 3 = a 6 = a 7 = 0, the model is a pure transformation; a 0 = a 4 , a 1 = −a 3 , a 6 = a 7 = 0, the model is transformation-zoom-rotation.To compute the model parameters, we can use gradient descent method as proposed in [3,15].

Fusion Side Information Generation.
Combining the temporal and spatial side information, fusion side information can be gained.We get a binary fusion mask in which 0 indicates that the pixel is taken from the spatial side information; 1 indicates that the pixel is taken from the temporal side information.Fusion process can be simply described as follows: the temporal side information and spatial side information with the previous key frame were compared, if the spatial side information is closer to the pixel value of the key frame, we set binary mask to 0; if the temporal side information is closer to the pixel value, we set binary mask to 1.We perform the same processing with future key frame.Thus, binary mask for side information can be obtained.Finally, we perform an OR logic operation between both binary masks to obtain the binary fusion mask.The fusion process is shown in Figure 3.

Experiments and Analysis
Similar to our previous work [13], in our simulation experiments, LDPC codes are generated by PEG algorithm [16], and the rate of LDPC code is 7/8.To change the rate, we varied the number of quantization levels, LDPC encoder produces different output bit rate, and then different compression ratio can be obtained.After several experimental analyses and comparisons, 64 is the ideal threshold of ROI criteria in our experiments, DC + 8AC of low-frequency DCT coefficients are selected as DCT hash.The proposed scheme is tested with exit and ballroom [17] which are QCIF (176 × 144) video sequences (at 25 fps and with a total of 100 frames).Video streams of Camera 1 uses the H.263 coding scheme; the Camera 0 uses distributed video coding, Key frames and Wyner-Ziv frames coding sequence for "K-W-K-W"; the key frames K and Wyner-Ziv frames W are alternative coding.We compare the rate-distortion performance of multiview distributed video coding only uses temporal side information, only uses spatial side information, uses fusion side information, H.263 intraframe coding (I-I-I-I), H.263 interframe coding (I-P-P-P), and JPEG coding.H.263 + codec uses TMN8.The experimental results are shown in Figure 4. multiview video coding has significantly better performance (2 to 3 dB) than that of H.263 intraframe coding.However, multiview video coding system is less than the overall complexity of the H.263 intraframe coding.The gap from H.263 interframe coding is a certain distance.The proposed approach can bring improvements up to 0.2-0.5 when compared to the multiview distributed video coding only uses temporal side information.The proposed algorithm can gain more accurate motion estimation in the intense motion region to obtain the best side information.Decoder utilizes the belief-propagation (BP) over cycle-free Tanner graphs in the iterative decoding process.
Figure 5 shows the decoded images (the 17st frame) of "exit" and "ballroom" sequences using the fusion side information.Figure 6 shows the decoded images (the 17st frame) of "exit" and "ballroom" sequences only using temporal side information.The subjective effects of the decoded image have been improved.

Conclusion
In this paper, a novel multiview distributed video coding algorithm is presented.In the main perspective, we select ROI macroblocks based on SAD criteria, a bidirectional hash-based interpolation is used to generate side information macroblock for motion intense area; however, motion nonintense area utilizes motion compensation interpolation  (MCI).Finally, we gain the side information based on fusion of temporal and spatial side information.Experimental results demonstrate the algorithm's effeciency.

Figure 2 :
Figure 2: Multiview distributed video coding framework based on the DCT hash.