Improved Motion Estimation Using Early Zero-Block Detection

,


INTRODUCTION
The newest international video coding standard H.264/AVC has recently been approved by the ITU-T (as recommendation H.264) and by ISO/IEC as the international standard MPEG-4 part 10 advanced video coding (AVC) standard [1].The emerging H.264/AVC achieves significantly better performance in both PSNR and visual quality at the same bit-rate compared with prior video coding standards such as MPEG4 part 2 and H.263.One important technique is the use of the variable block-size motion estimation and rate distortion optimization techniques; the computational complexity of H.264/AVC is dramatically increased due to the variable block-size modes performed.
Many fast and efficient methods for motion estimation (ME) have been proposed in recent years to reduce computational cost and maintain coding performance.In general, there are two ways to reduce computation.One is to speed up the ME algorithms themselves, such as the hybrid unsymmetrical-cross multihexagon-grid search (UMHexagonS) algorithm [2], which has been adopted in JM reference software.The other is to terminate the ME calculation by early detection of the zero-blocks (ZBs) of discrete cosine transform (DCT) coefficients after quantization.
Xie et al. [3] established a zero-block condition based on the following criterion: where X(0, 0) = (1/N) N−1 i=0 N−1 j=0 x(i, j), and x(i, j) is residual samples between the current macroblock and the reference macroblock.For H.264, the relation between Q step and quantization parameter is Q step = 0.625 • 2 QP/6 .This criterion has been employed in the JM reference software.In [4,5], the early zero-block detection approach was applied to the motion search process using a threshold of 20Q step for comparison with the sum of difference of 8 × 8 block size (SAD 8×8 ) and deciding whether 8 × 8 DCT is a zeroblock.The motion search stops when all zero-blocks are detected.This results in significant computational savings, especially for low bit-rate coding.The threshold of 20Q step (corresponding to 5Q step in 4 × 4 discrete cosine transform and quantization (DCT/Q)) is not sufficient, and it could improperly detect a great number of zero-blocks, leading to a severe degradation in coding performance.

2
EURASIP Journal on Image and Video Processing Some sufficient but not necessary conditions for zeroblock detection of DCT coefficients after quantization were derived by examining the sum of absolute differences (SADs) between the current macroblock and the reference macroblock [6,7].Although the zero-blocks of DCT coefficients can be detected correctly, numerous zero-blocks still remain undetected.Based on Moon's method [7], a technique using an adaptive threshold was suggested to enhance zero-block detecting capability [8].
In this work, we derive a nearly sufficient condition based on the ensemble average of all 4 × 4 DCT coefficients.The nearly sufficient condition for zero-block detection is then applied to both motion search and DCT/Q calculation in the UMHexagonS algorithm.The experimental results reveal that a significant improvement in computation reduction can be achieved compared to methods using the other two sufficient conditions, while high coding efficiency is still maintained.

A NEARLY SUFFICIENT CONDITION FOR ZERO-BLOCK DETECTION
To guarantee integer transform, the 4×4 DCT in H.264/AVC is approximated to the following form: where a = 1/2, b = √ 2/5, and c = 1/2.The basic quantization operation is given by The value of quantization parameter (QP) varies in the range 0-51.The quantizer step size Q step is used to control bit-rate and video quality.With postscaling factor (PF) considered with the quantizer, the quantized output Z i j can be written as where W i j is the entry of the core 2D transform W = CXC T .To avoid any division operation, the factor (PF/Q step ) is implemented by a multiplication factor and a right shift: where r = 2 − (i%2) − ( j%2), % denotes the modular operator, and M(QP%6; r) is the multiplication factor.The quantized coefficient can be implemented using integer arithmetic: where represents a binary shift right, and f is 2 qbit /6 for interblocks or 2 qbit /3 for intrablocks.
Sousa [6] derived a simple sufficient condition under which each quantized coefficient becomes zero for 8×8 DCT.To derive the sufficient condition for 4 × 4 DCT, the PF factor is absorbed back into the core 2D transform and 4 × 4 DCTcoefficients are rewritten Y: where a = 1/2, b = √ 2/5, and c = (1/2) √ 2/5.Each coefficient Y i j can be written as for all DCT coefficients.For interblock encoding, the DCT coefficient is quantized as zero when the quantized coefficient From ( 10) and (11), it is easy to show that the 4 × 4 DCT is a zero-block if the sum of absolute differences SAD 4×4 satisfies This is Sousa's sufficient condition for zero-block detection.Moon et al. [7], derived a more precise sufficient condition for zero-block detection by examining the integer 4 × 4 transform and quantization in H.264/AVC, which is summarized as follows: (1) if SAD 4×4 ≤ T 0 , then 4 × 4 DCT is a zero-block, and where (2) if SAD 4×4 > T 0 and SAD 4×4 ≤ min{T 0 + γ/2, T 1 }, then 4 × 4 DCT is also a zero-block where the parameters T(1) and γ are, respectively, given by Interestingly, note that T(0) is exactly identical to Sousa's condition.As can be seen, the condition varies with x i j .An intensive study indicates that this sufficient condition varies within a range 2.2Q step ∼2.5Q step , which is a little higher than the Sousa's condition (2.08Q step ).

Zero-block detection capability and computation reduction in DCT/Quantization
The various thresholds for zero-block detection as a function of QP are plotted in Figure 1.Note that both Sousa's and Moon's conditions are theoretically sufficient, but not for the thresholds 3.5Q step and 5Q step .The zero-block detecting capability of all various thresholds carried on the news and paris sequences are plotted in Figure 2.Although both Sousa's and Moon's conditions are theoretically sufficient, fewer zero-blocks can be detected using these two sufficient conditions compared to the other two conditions.The threshold 5Q step brings out the best zero-block detecting capability; it simultaneously detects numerous improper zero-blocks that could lead to severe performance degradation.The percentage of zero-blocks detected improperly using these two nonsufficient conditions are shown in Figure 3.As can be seen, less than 1% of improper zeroblocks were found for the ensemble average threshold  3.5Q step , while more than 9% for the threshold 5Q step for QP = 16.
To evaluate the performance of previously mentioned conditions for early zero-block detection, an experiment was performed in DCT/Q calculation.Table 1 displays the savings of total encoding time in DCT/Q as well as PSNR loss, conducted on the news sequence, for different QPs.The integer transform and quantization only occupies about 5% of the total encoding time.Note that no loss in either PSNR or bit-rate were found for Sousa's and Moon's conditions.As shown, the threshold 3.5Q step can achieve a significant reduction in DCT/Q computation with a negligible PSNR loss.Up to 3% of total encoding time can be saved with PSNR loss of only 0.005 dB for QP = 48.The threshold 5Q step [4], however, runs into a severe PSNR degradation due to improper zero-block detection, although computation in DCT/Q can be further reduced.Consequently, the threshold 5Q step is not subsequently analyzed.

CONVENTIONAL METHODS TO ADOPT ZERO-BLOCK DETECTION IN UMHEXAGON ALGORITHM
In   In the early termination method of motion estimation, each SAD i j 4×4 in SAD M×N is compared with a threshold; and if all SAD i j 4×4 satisfy sufficient or nearly sufficient conditions, the motion search stops.In addition, the DCT/Q calculation need not be done if the 4 × 4 DCT is a zeroblock.This leads to a great reduction in computation.Since the conventional early zero-block detection method only requires a comparison of SAD i j 4×4 with a threshold, this approach can be applied to all kinds of motion searches, such   as full search and all other fast search algorithms.This has been investigated in [4,5].
In this section, we apply the various zero-block detection methods to the UMHexagonS algorithm and investigate the performance.The simulation conditions are tabulated in Table 2. Table 3 displays the average search points per block for different QPs conducted on the news sequence achieved using various zero-block detection thresholds.As shown, the average search points decrease with increasing threshold.For the news sequence and QP = 48, up to 78% of average search points (14.09 reduced to 3.04) in the motion estimation can be saved when utilizing the zeroblock detection approach using threshold 3.5Q step : much higher than the other two sufficient conditions (9.11 and 7.26, resp.).The average PSNR loss, bit-rate increment, and motion estimation time saving versus QP are also compared using various thresholds and tabulated in Table 4.As shown, the early zero-block detection using a nearly sufficient condition (i.e., with threshold 3.5Q step ) significantly outperforms other thresholds in terms of computation for any bit-rate coding.As high as 56% of motion estimation time can be saved for QP = 48 compared to the UMHexagonS algorithm.
The PSNR degradation, to whatever extent it occurs, becomes strict for low bit-rate coding or high QP.Table 5 displays PSNR loss conducted on several video sequences for QP = 48.As shown, the conventional zero-block detection runs into a PSNR loss of 0.212 dB on the foreman sequence.This phenomenon is illustrated in Figure 4, which demonstrates the SAD error surface and the corresponding search iterations using the UMHexagonS algorithm in mode 16 × 16 for a macroblock (42nd MB, 10th frame) in the foreman sequence.As shown, it requires 110 search points for the UMHexagonS algorithm to find the minimum error (SAD 16×16 = 864 at the 26th iteration).The search stops at the 26th iteration and the minimum error can also be found when the conventional zero-block detection method is employed to the UMHexagonS algorithm with QP = 30 (threshold 3.5Q step = 70).However, the search stops at the first iteration where SAD 16×16 = 1210 as QP is increased to QP = 48, which corresponds to the threshold 3.5Q step = 560; and this leads to severe performance degradation.As the quantization parameter increases, the degradation becomes harsher.

IMPROVED UMHEXAGONS ALGORITHM
The conventional early zero-block detection technique cannot give a satisfactory coding performance when applied to the UMHexagonS algorithm for large quantization step sizes.
In this section, we modify the UMHexagonS algorithm using the early zero-block detection technique to achieve high coding efficiency.horizontal direction (average 27%), and vertical direction (average 18%).The early zero-block detection technique is not employed in these search points to improve coding performance.In addition, the motion search does not stop immediately when the nearly sufficient condition is satisfied.Instead, the diamond search is performed to find a smaller SAD.The improved algorithm is illustrated in Figure 5, and summarized as follows.Step 1. Predict the initial search point.
Step 3. Perform uneven multi-hexagon-grid search.If all SAD i j 4×4 satisfy the nearly sufficient condition in (16), the motion search stops in this step and jumps to the diamond search in Step 4 and perform the diamond search.
Step 4. Perform extended hexagon based search.Similarly, if all SAD i j 4×4 satisfy the nearly sufficient condition in the hexagon search, then jump to perform the diamond search.
The average PSNR loss, bit-rate increment, and ME time saving of the improved algorithm versus QP are also compared with the UMHexagonS algorithm and tabulated in Table 6.As shown, a great improvement in computation and up to 55% of ME computation can be saved, while maintaining a very good rate distortion performance.A gain of 0.128 dB in PSNR can be obtained for the improved algorithm on the foreman sequence for QP = 48 with a slight increase in computation, compared to the conventional early zero-block detection method.

CONCLUSION
In this paper, we modified the early termination of UMHexagonS algorithm to avoid the serve performance degradation in high QP.In addition, we derived a nearly sufficient condition for zero-block detection of 4 × 4 DCT coefficients after quantization, based upon the ensemble average of all 4 × 4 DCT coefficients.The nearly sufficient condition for zero-block detection is shown to have excellent zero-block detecting capability, while improper zeroblock detection is negligible.The early zero-block detection approach with a nearly sufficient condition (threshold 3.5Q step ) was then applied to both motion search and DCT/Q calculation in a fast-motion estimation algorithm (UMHexagonS algorithm).The simulation results reveal that a significant improvement in computation reduction (up to 55%) can be achieved with negligible performance degradation compared to the UMHexagonS algorithm.

Figure 5 :
Figure 5: Early zero-block detection for motion search and DCT/Q.

Table 1 :
Encoding time saving and PSNR loss in DCT/Q.
|Y i j |, if the ensemble average of DCT coefficients |Y av | is applied to (11), the following upper bound for zero-block detection can be obtained: SAD 4×4 < 3.5Q step .

Table 3 :
Average search points per frame achieved by various thresholds.
Many commonly used video sequences (4 QCIF sequences: foreman, carphone, football, coastguard and 4 CIF sequences: stefan, mobile, paris, tempete) with different motion contents are simulated by exploiting full search algorithms on these video sequences with a search range w = ±16.The experimental results indicate that a large number of global minimum are occupied near the search center especially at the zero MV (0,0) (average 38%),

Table 4 :
Performance comparison on news sequence.

Table 5 :
PSNR loss using nearly sufficient condition for QP = 48.

Table 6 :
PSNR loss, bit-rate and ME time saving.