Elsevier

Signal Processing

Volume 86, Issue 10, October 2006, Pages 2877-2886
Signal Processing

Modified predictive line search for block motion estimation on multimedia processors

https://doi.org/10.1016/j.sigpro.2005.11.017Get rights and content

Abstract

Predictive line search (PLS) algorithm had already been approved successfully in block motion estimation on multimedia processors to take the advantages of flexibility and parallelism. This paper presents an efficient algorithm named modified PLS (MPLS) to avoid redundant computation of PLS. In addition, MPLS utilizes an adaptive computation distribution mechanism to efficiently allocate the available computation of the employed multimedia processors to blocks or frames of video sequences to achieve higher output quality. Dependent on the motion activity in videos, experimental results indicate that MPLS performs better than PLS in terms of output quality and requires less computation by up to 23% on average.

Introduction

The progress in computer processing power makes software-based video codec (encoder and decoder) a feasible solution especially in the platform of multimedia processors (MPs) due to their superior programmable flexibility and parallel architectures [1], [2], [3]. The motion-estimation/motion-compensation technique is a prime method for removing the temporal redundancy among adjacent frames of video sequences. The most popular technique for motion estimation is the block-matching algorithm (BMA). However, this algorithm typically consumes 60–80% of processing power required to encode a video and has a high impact on the visual output quality. Therefore, the design of the BMA is one of the key issues for MP-based video codecs.

In a typical BMA, each frame is divided into several non-overlapping blocks of size 16×16 pixels. Each block searches its own motion vector (MV) containing the minimum matching difference—using for example the sum of absolute difference (SAD)—within a search area in the referenced frame. Full-search (FS) algorithm is typically considered to be the optimal BMA; this exhaustively searches the MV in the search area. However, the computational complexity of FS is often too high to meet the real-time requirement. Many fast BMAs have been proposed to reduce the computational complexity of motion estimation, such as the three-step search (TSS) [4], the new three-step search (NTSS) [5], the four-step search (FSS) [6], the diamond search (DS) [7], and predictive motion vector field adaptive search technique (PMVFAST) [8]. The above fast BMAs are basically designed to search as few candidates (checking points) as possible without a significant SAD degradation. However, the goal of fewer checking points is unsuitable for MPs because they can process multiple pixels in parallel and have special instructions to execute SAD in a single clock cycle. Based on the above features of MPs, Y. W. Huang et al. proposed an efficient motion estimation algorithm, named a predictive line search (PLS), for MPEG-4 encoding system in [9].

PLS is also a kind of predictive BMAs, widely employed in current video coding standards due to advantages of low coding complexity and high output quality. The initial MV of a block in PLS is predicted by the medium value of MVs of the three spatially adjacent MVs (left, top, and top-right). Unlike the other BMAs employing a small square search window moved horizontally and vertically in the direction of the current best matching point for the next search, the horizontal size of the search window, i.e., the search line in PLS, is fixed as the length of the search area to exploit the parallelism of MPs and the search window is only vertically moved for the next search. Therefore, the performance of PLS is extremely dependent on the horizontal motion activity of objects on video sequences. The output quality of PLS is poor if an object moves beyond the search line. Therefore, the length of the search line is usually set as a large value, such as 32. The larger length of the search line employed, the higher output quality achieved, but the more redundant computation required because it is useless to utilize a large search line for objects with small horizontal motion activity.

The search line must be reasonably reduced to avoid redundant computation. An ideal length of the search line is dependent on both the parallelism of MPs and the horizontal motion activity of objects. A good candidate for the search length is 16 because most of MPs, such as the Equator MAP-CA [1] and Intel SSE3 [2], can process two sets of 16 pixels in parallel, and most of the horizontal motion activity of objects is limited in this range. Once the search length is set as 16, the search window must also be horizontally moved to extract the large horizontal motion vectors. Based on the above features, we propose an efficient motion estimation algorithm, named modified PLS (MPLS), for MP-based video codecs in this paper. Experimental results show that MPLS performs better than PLS in terms of output quality and requires less computation by up to 23% on average.

PLS and MPLS stop only when all subsequent checking points are examined. They can perform well for videos whose required computation is less than the available computational power of MPs. However, it is impossible to restrict all input videos satisfying the above condition. Input videos may contain complex motion content such that the computational power of MPs cannot fully support the required computation of videos. If the above case happens, the quality of coded videos is degraded because some blocks of videos cannot be processed. Actually, the quality of coded videos can be improved if the computational power of MPs can be usefully controlled. Researches on computation-aware (CA) schemes for BMAs are to study algorithms to distribute available computation to each frame as well as the corresponding blocks to achieve higher output quality [10], [11], [12]. A scalable algorithm based on a particular 3D recursive search BMA was proposed in [10], and an adaptive algorithm updating the search level was proposed in [11] to distribute computation for nearest-neighbor searching. In [12], several CA schemes were successfully applied to traditional BMAs, such as FS, TSS, NTSS, FSS, and DS. This paper also presents an efficient CA scheme for MPLS.

The remainder of this paper is organized as follows. Section 2 describes the proposed MPLS, including the presented CA scheme. Experimental results are given in Section 3. Some conclusions are drawn in Section 4.

Section snippets

Modified predictive line search

In this section, we first give the main features of PLS and then present the details of the proposed MPLS since MPLS is originated from PLS. Fig. 1 shows the coding structure of PLS, a kind of predictive BMAs. The initial MV of a block is predicted by the medium value of MVs of the three spatially adjacent MVs (left, top, and top-right). A search window with three lines around the MV predictor is first examined, where the length of the search line, denoted as L, is fixed as the horizontal size

Computation-aware scheme

The available processing power is one of fundamentals to the performance of codecs. Studies on CA schemes for BMAs are to investigate algorithms for distributing the available computation to blocks and frames so as to achieve higher output quality, compared to that without using CA schemes. In general, a CA scheme comprises two phases: (1) frame-level computation distribution and (2) block-level computation allocation. The goal of the first phase is to determine a target computation for each

Analysis of storage and computation overhead

The proposed MPLS has storage and computational overhead over the original PLS.

The frame-level CA mechanism needs three storages to store μ, Cp, and Fp, and two operations (one division and one comparison) to calculate μ. The sliding window approach requires six storages to store M, Fi−M, Fi, Ri, Cconsumedi, and Ctargeti. For the frame i, four additional operations are necessary: one addition to evaluate Fi; one addition and one comparison to evaluate Ri; and one subtraction to evaluate Ctargeti

Experimental results

In this section we present experimental results of motion estimation using PLS, and MPLS. These algorithms are applied on three SIF videos (100 frames of the football, tennis, and M&C (mobile and calendar) sequences) and one QCIF video (300 frames of the foreman sequence). The sequences have been selected to include different motion content from rapid, as in the football, to slow, as in the M&C. The block size is fixed at 16×16, and the coded quality of motion compensated videos is measured by

Conclusions

PLS is a good choice for MP-based motion estimation algorithm, but it involved some redundant computation. This paper proposes an efficient algorithm named MPLS to utilize the advantage of PLS whilst also overcoming its drawbacks. In addition, this paper provides a sliding-window approach and a threshold-adaptation mechanism to distribute the available processing power of MP-based codecs to frames and blocks of video sequences, respectively. Experimental results demonstrate that the quality of

References (12)

  • C. Basoglu et al.

    The equator MAP-CATM DSP: an end-to-end broadband signal processorTM VLIW

    IEEE Trans. Circuits and Systems Video Tech.

    (August 2002)
  • V. Lappalainen et al.

    Overview of research efforts on media ISA extensions and their usage in video coding

    IEEE Trans. Circuits and Systems Video Tech.

    (August 2002)
  • A.R. Iranpour, K. Kuchcinski, Evaluation of SIMD Architecture enhancement in embedded processors for MPEG-4,...
  • T. Koga, K. Iinuma, A. Iijima, T. Ishiguro, Motion-compensated interframe coding for video conderencing, Proceedings of...
  • R. Li et al.

    A new three-step search algorithm for block motion estimation

    IEEE Trans. Circuits and Systems Video Tech.

    (August 1994)
  • L.M. Po et al.

    A novel four-step search algorithm for fast block motion estimation

    IEEE Trans. Circuits and Systems Video Tech.

    (1996)
There are more references available in the full text version of this article.

Cited by (0)

View full text