Multi-rate Sampling Wavelet Lifting Scheme Based Video Compression with Enhanced Adaptive Rood Search Integral Projections

- This paper presents the research on video compression for videos by developing a multi rate wavelet lifting scheme method which works better for both colour and grayscale videos along with Enhanced Adaptive Rood Search with Integral Projections for motion Estimation. In wavelet lifting scheme sampling is performed at different rates at the upper and lower branches. It is a powerful alternative to traditional convolution involving forward and inverse filter banks with the total amount of arithmetic computations required is substantially lesser. The ratio used is 3:2 for the upper branch and 3:1 for lower branch of lifting scheme, more low frequency coefficients are preserved as compared to high frequency coefficients to have a better picture quality with a small compromise in compression ratio. The Listless speck has been used as the Encoder and an Enhanced Adaptive Rood Search technique has been developed for motion Estimation as it improved over the problem with Adaptive Rood Search which does not consider the diagonal direction. The proposed method has produced better compression results with quality and reduced latency than the existing ones as validated in the experimentation.


Introduction
In video processing exercises, especially in building image compressors, there is a delicate balance between perceived quality of reconstructed video frames and the compressed size. One usually comes at the expense of the other. Within the framework of a given transform coding, it is necessary to achieve better compression without compromising on visual quality. There are numerous researchers who have developed various methods to achieve the same and with due respect to them, contributions of few of them are mentioned below.
A Video Compression Standard H.264 within video surveillance. H.264 takes video compression technology to a new level. Quality video streams, higher frame rates and higher resolutions at maintained bit rates (compared with previous standards) or conversely, the same quality video at lower bit rate. H.264 [1] is expected to replace other compression standards and methods in today's use. The network video products that support both H.264 and Motion JPEG were ideal for maximum flexibility and integration possibilities. This opens the motion estimation algorithm for integration [2] have presented a PCA based method for video compression. This method although improves the quality through PSNR with Large size videos its latency is higher.
A video compression technique which tends hard to exploit the relevant temporal redundancy in the video to improve solidity efficiency with minimum processing complexity. It includes 3D to 2D transformation of the video that allows exploring the temporal redundancy of the video using 2D transforms and avoids the computationally demanding motion recompense step. In order to achieve this seemingly contradictory situation, here a multi-rate sampling technique [3] has been devised where the low frequency sub-bands are given more prominence with milder sub-sampling in conjunction with heavier subsampling on the high frequency branches of the wavelet decomposition. The effectiveness of this scheme is based on the premise that the low frequency components preserve the essential details of the image while the high frequency components accentuate prominent details such as edges and rapid transitions in colour. An unbalanced wavelet scheme with various ratios in the upper and lower band is developed has been tested on images [4]. The paper never discusses on videos where the motion estimation becomes more important as the ratios should be suitable although it is doing a great work of Images. It uses a linear Interpolation which does not take into account the underlying pattern.
The proposed lifting structure has down sampled ratios 3: 2 and 3: 1 in the upper and lower branches, respectively and also utilise the advantage of using update before the predict. The idea here is to preserve lower frequency component to get a better PSNR [5]. It uses the Lagrange's interpolator to improve the quality of reconstructed signal. The improvement comes by developing better Encoders, motion Estimation and a Customised Transforms. Here a Multi-rate wavelet lifting scheme with Haar Wavelet has been developed with LaGrange's polynomial for better reconstruction. This challenge on video compression on the issues of Multi-rate has been the matching of motion algorithm and also the Encoding techniques [6]. A Multirate wavelet lifting scheme with Quartic polynomial for interpolation along with Enhanced Adaptive Rood Search Motion Estimation Algorithm has been developed. This is discussed in subsequent section and also results section will discuss the betterment of results provided by the proposed Technique. The balanced lifting scheme with Haar wavelet is developed. Haar wavelet is used because it produced better results with LSK among other wavelets. A comparison is made with Multi-rate wavelet lifting scheme and balanced Haar wavelet lifting scheme.
The motion estimation search Adaptive Rood Search Spatio Temporal with Integral Projections (ARS-STIP) method used is enhanced as it was confined to left adjacent and immediate top row block with respect to the candidate block, assuming that the camera motions are mostly in the horizontal and vertical directions [7]. With this assumption, for videos with larger frame size, as in HDTV, a lot of data may be lost and may not give good correlation. This could reduce the quality of the reconstructed frame and PSNR values. In order to overcome this limitation, the top left block of the immediate upper row with respect to the candidate block is also considered to improve the picture quality by having better correlation [8]. This modified motion estimation algorithm is termed as Enhanced Adaptive Rood Search with Spatio -Temporal correlation and Integral Projections (EARS-STIP).
The LSK encoder scheme has been further improved by considering pyramid which represents three sub-bands. If all the three sub-bands are found insignificant when tested at some bit-plane especially at higher levels, instead of sending three zeros for three sub-bands one zero is sent thereby increasing the Compression Ratio. The LSK is also used effectively for colour videos. The video format used is 4:2:0. The Metrics used PSNR, Compression Ratio, and Structural Similarity Index [9].

Proposed Multi-rate Sampling Wavelet of Use
In video processing exercises, especially in building image compressors, there is a delicate balance between perceived quality of reconstructed video frames and the compressed size [10]. One usually comes at the expense of the other. Within the framework of a given transform coding, it is necessary to achieve better compression without compromising on visual quality. In order to achieve this seemingly contradictory situation, here a multi-rate sampling technique, It has been devised where the low frequency sub-bands are given more prominence with milder sub-sampling in conjunction with heavier subsampling on the high frequency branches of the wavelet decomposition [11]. The effectiveness of this scheme is based on the premise that the low frequency components preserve the essential details of the image while the high frequency components accentuate prominent details such as edges and rapid transitions in colour [12]. Figure 1 shows the proposed lifting structure which has down sampled ratios 3 : 2 and 3 : 1 in the upper and lower branches and also utilise the advantage of using update before the predict which maintains the stability. The idea here is to preserve lower frequency component to get a better PSNR.
In the predictthenupdate case, the problem of stability cannot be solved by synchronization alone, i.e., makes the encoder its choice of predictor based on quantized data. The reason is that the reconstructed values are obtained from the quantized low-pass value [13]. The lowpass signal is a function chosen for prediction. Hence the encoder cannot obtain the quantized values until it selects a predictor, and it cannot select a predictor without obtaining coefficients.
A simple modification procedure is proposed that solves the stability and synchronization problems [14] reverse the order of the predict and update lifting steps in the wavelet transform. First even samples are updated based on the odd samples yielding the lowpass coefficients these lowpass coefficients are reused to predict the odd samples, which gives the highpass coefficients [15]. A linear update filter is used and let only the choice of predictor depend on the data.

Fig. 1: Block Diagram of multirate wavelet
Because update first and the transform is only iterated on the low pass coefficient, all throughout the entire pyramid linearly depends on the data and are not affected by the non-linear predictor. Figure 2 shows the input sequence being decomposed in the range 3;2 and 3:1. The derivation of updates and prediction coefficients is derived below from Equation 1 to Equation 9. (3)

Prediction Coefficients Calculations
Predictor coefficients remove as much as information in the lower branch giving the detailed coefficients. This is done by the Lagrange interpolation technique. The simple predictor is the Lagrange polynomial can gives a better interpolation and gives good reconstruction properties.
(8) Here more weightage is given to X u than X u because the former occurs at t=1.5T and the latter occurs at t=3T and C occurs at t= 2T hence one which is closer to the C is given more weightage with k value adding for stability. In general, the predicted equation is described by equation.

LSK-Listless Speck
LSK Listless Set Partitioning Embedded block [16] a small enhancement has been done to LSK. Here one zero is assigned for several insignificant sub bands instead of three and also presents how LSK is used for colour videos. This reduces the coding length and time as compared to the LSK and the string to be transmitted gets reduced and it leads to further compression. This pyramid tracking is done only at higher bit-plane significant testing. Figures 3 and 4 are representing pyramid, state array diagram, motion order scanning and linear indexing for colour planes in LSK.
Keeps track of set partitions within a wavelet band. Like LSK [17], ALSK uses strictly breadth first tests because the set splitting rules of both are same, though both coders produce different output bit strings. The three passes per bit plane used are the insignificant pixel (IP) passes where a lone insignificant pixel will be tested for significance. Second, the insignificant set pass (IS), tests each multiple pixel set for significance. IS pass comprises of two passes [18]. These are the insignificant level pass (IL), which tests a wavelet pyramid level for insignificant and the insignificant group of level passes (IGL), which tests several wavelet levels to be insignificant. Finally, the refinement (REF) pass, that refines pixels found significant in previous bit-plane passes. IL and IGL pass in IS pass, are effective for some initial higher bit planes. As the scanning of bit planes from MSB to LSB goes down, IL and IGL passes will be ignored. This is because; most of the wavelet coefficients become significant at lower bit planes [19].
For colour videos the three planes used are Y CbCr. The linear indexing for colour planes is as shown in fig.4 (b). Figure 3a and Figure 3b represents the pyramid of LSK state array diagram the order in which it is scanned maintaining the low frequency components which contribute to the information as well as the colour scanning effectively in Figure 3c and Figure 3d in order to preserve the edges in colour images. The linear indexing avoids the list for storing the information and hence there is a considerable reduction in memory. In the proposed LSK algorithm, the pre computed arrays are eliminated at the expense of repeated searching of blocks for significant coefficients [20]. If each sub band coefficient is stored in T bytes, the total bulk storage memory needed is: RCT for the sub band data and RC/2 (half byte per pixel) for the state table MVi [3 × (L -1) + 4 + (L -1)] / 2 For an L level of wavelet decomposition, MPi and MVi state tables needs bytes. Total memory required is given by Equation (10).
(10) In Luma plane there is no sub-sampling being done while in the Chroma plane there is sub sampling. The standard used is 4:2:0 [21]. This flat and mild quantization yields the best results. This observation is Vector quantization doesn't produce better results in spite of the heavy computational overheads.

Modified Rood Search
The ARS-ST method in Figure 5 considered blocks B and D assuming that the camera movements were mostly in horizontal and vertical directions. But as the size increases especially with HDTV with this assumptions lot of information may be lost [22]. In order to increase the accuracy, the ARS-ST motion estimation is modified.
Here in addition to the B and D the block C is also considered i.e., top left block.\The (Mean Absolute Difference) MAD between the Motion Vector of the two blocks is selected as the motion correlation between them and this can be represented by the following equations. The (Mean Absolute Difference) MAD between the Motion Vector of the two blocks is selected as the motion correlation between them and this can be represented by the following Equations (11) to (13) the MAD between the A and C. Likewise two block the is the MAD between the two blocks and .

Fig. 5: Spatial and Temporal block for motion
Best candidate is selected for Motion Vector prediction. For each block on the first row the motion vector is predicted from its immediate left neighbouring block "B" is (12) for each block on the first column, the motion vector predicted from its immediate top neighbouring block "D" (13) Two stages, each stage adopting two search patterns are used to perform the block-matching method. In the first stage or the initial search stage, the length of the horizontal and vertical rood arms are determined adaptively and individually by a proposed simple and efficient rood pattern [23]. This search pattern reduces unnecessary intermediate search and the risk of being trapped into local minima by attempting to place the search origin for the subsequent steps at a position close to the global minimum of the error surface [24]. In the second stage, until the least cost is established in the middle of the search pattern, a fixed-size diamond search is adopted and utilized repeatedly.

Rood Size
This is based on the observation that compared to other directions the MV distribution is generally higher in the horizontal, vertical directions and top left [25].

Fig. 6: Adaptive Research Points
Circle marks are used in Figure 6 to depict the search points on the rood pattern diagonal direction with respect to the current candidate vector and also due to raster scanning order. and are the lengths of the two horizontal and two vertical arms are given by Equations (14) to (17). The search pattern is made more flexible by adjusting the horizontal and vertical arms individually, as follows, (15) A fixed size of 2 pixels is used for both and , for the first block of the first row. and have the same value for other blocks of the first row, and are determined from the MV components of their nearest left a adjacent blocks using, (17) The point indicated by the predicted MV which has a high possibility of approximating the real MV must be examined in addition to the 5 points on the centre and vertexes of the rood [26]. Thus, it is necessary to check six points during the initial search stage as shown in Figure 7. The centre for the search in the next stage is set by selecting the one which offers the least cost which is found by Integral projections

Enhanced Diamond Search
The least cost point found in the Adaptive Rood Search now becomes the centre of the unit diamond search of the next stage. Let the centre of the diamond [27] be at f which is the least cost point of Adaptive Rood Search. If the minimum cost belongs to any of the vertex points of the diamond, then the next level diamond search is performed considering the identified point as the centre point [28]. If the minimum cost belongs to the centre of the diamond then two best minimum If the minimum cost belongs to the centre of the diamond then two best minimum costs among the fourvertex points a, b, c, d is found. One more point (which is diagonal to centre i.e., ab or bd or cd) is evaluated in the direction of identifying points i.e., from next two minimum cost points in Figure 7. Fig.8 below shows the Proposed Video compression Scheme. The proposed scheme uses the LSK algorithm to intra code I-frame. After performing motion estimation, the residual is encoded for the P-frame. Once the motion estimation is done using EARST-IP method, the P-frame is reconstructed using the motion vector calculated and the reference frame. Here, the reference frame is not used directly; instead, the decompressed reference frame is already compressed using the LSK algorithm. The reconstructed P-frame is known as the Predicted P-frame. And the difference between the original P-frame and Predicted P-frame is called residual. This residual is transmitted along with the motion vector to produce good quality of predictive frames. The residual is also coded using LSK so that size of the residual to be transmitted decreases. The rate at which the residual is coded is much less compared to the rate at which I-frame is coded. The block diagram in Figure 8 describes the entire video compression process. SPIHT, SPECK [29] are not used for intra-coding of frames because they are normally considered to be computationally complex. But LSK can be implemented easily and quickly. It is appropriate for hardware implementation, but the only disadvantage is that it requires increased memory. And also, the speed of the encoder is improved by using EARST-IP in the motion estimation.  The metrics used for the results are PSNR, SSIM, compression Ratio and tested under various Quantization parameters Index. Figure 10 shows the original and reconstructed videos of different type with proposed method and corresponding method used in that research paper. Out of 8 videos five of the results have been shown with Images with the remainder three is in tabulation. Table 1 to Table 4 are showing the results obtained. In the second Part the Proposed Multi-rate wavelet lifting scheme results are compared with the results obtained from different research papers of different technique for the video frame considered in their papers.

Video Encoder for Proposed Scheme
Here PSNR, COMPRESSION RATIO is considered. All simulations are done through MATLAB. Figure 9 shows the quality of videos with proposed method compared with balanced Haar wavelet method. Table 1 to Table  4 gives the comparison of PSNR, COMPRESSIONRATIO, SSIM (structural similarity index measure), with quantization parameters and corresponding graphs are shown Figures 11-14. From the results obtained it is observed that the proposed method has produced better than the Haar balanced wavelet transforms.         The Tables 5 and 6 show that the proposed method has produced relatively better results than the other research paper with methods SCS, KCDS, MWBVEA in compression ratio and PSNR.

Conclusions
A Multi-rate wavelet lifting scheme has been developed to increase the PSNR and SSIM by preserving more low frequency components. The Multi-rate wavelet was designed to operate efficiently for videos of any frame size and very attractive if it is divisible by 3 like HDTV and if not, suitable numbers of zeros were padded. The ratio used was 3:2 in upper branch and 3:1 in lower branch of the lifting scheme. The lifting scheme was also modified by interchanging update and predict branch to maintain stability .The Multi-rate wavelet lifting scheme produced better results than Balanced Haar Wavelet lifting scheme in terms of PSNR and SSIM. Haar wavelet was used in balanced wavelet scheme because it produced better results with LSK. Finally, Multi-rate wavelet produced better quality with a slightly less (less than 5%) compression ratio as compared to balanced wavelet scheme.
The Adaptive Rood Search motion estimation algorithm has been enhanced with better block searching by considering all the neighbouring blocks to the candidate block as against only immediate left and top blocks were considered in existing method to have a better predicted motion vector. Also improved unit diamond search is developed to have a better final search for small motion and improved the motion estimation. This method brought better correlation for spatial and temporal activities in and between the video frames which has improved the average PSNR. The combination of Integral Projections has reduced the computational complexity. This method works well for any frame size such as HDTV which was the problem with existing method for large size frame and did not consider the top diagonal block to the candidate block and hence correlation was not good as it lost more data and gave a reduced PSNR.