Skip to main content
Log in

Optimizing the H.264/AVC Video Encoder Application Structure for Reconfigurable and Application-Specific Platforms

Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

The H.264/AVC video coding standard features diverse computational hot spots that need to be accelerated to cope with the significantly increased complexity compared to previous standards. In this paper, we propose an optimized application structure (i.e. the arrangement of functional components of an application determining the data flow properties) for the H.264 encoder which is suitable for application-specific and reconfigurable hardware platforms. Our proposed application structural optimization for the computational reduction of the Motion Compensated Interpolation is independent of the actual hardware platform that is used for execution. For a MIPS processor we achieve an average speedup of approximately 60× for Motion Compensated Interpolation. Our proposed application structure reduces the overhead for Reconfigurable Platforms by distributing the actual hardware requirements amongst the functional blocks. This increases the amount of available reconfigurable hardware per Special Instruction (within a functional block) which leads to a 2.84× performance improvement of the complete encoder when compared to a Benchmark Application with standard optimizations. We evaluate our application structure by means of four different hardware platforms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19
Figure 20
Figure 21
Figure 22

References

  1. ITU-T Rec. H.264 and ISO/IEC 14496-10:2005 (E) (MPEG-4 AVC) “Advanced video coding for generic audiovisual services”, 2005.

  2. ITU-T H.264 reference software version JM 13.2. Retrieved from http://iphome.hhi.de/suehring/tml/index.htm.

  3. X264—a free H.264/AVC encoder. Retrieved from http://www.videolan.org/developers/x264.html.

  4. Chen, Z., Zhou, P., & He, Y. (2002). Fast integer pel and fractional pel motion estimation for JVT, JVT-F017, 6th JVT Meeting, Awaji, December.

  5. Raja, G., & Mirza, M. J. (2004). Performance comparison of advanced video coding H.264 standard with baseline H.263 and H.263+ standards. IEEE International Symposium on Communications and Information Technology (ISCIT), 2, 743–746.

    Article  Google Scholar 

  6. Wiegand, T., Sullivan, G. J., Bjntegaard, G., & Luthra, A. (2003). Overview of the H.264/AVC video coding standard. IEEE Transactions on Circuits and Systems for Video Technology, 13(7), 560–576. doi:10.1109/TCSVT.2003.815165 (CSVT).

    Article  Google Scholar 

  7. Ostermann, J., et al. (2004). Video coding with H.264/AVC: tools, performance, and complexity. IEEE Circuits and Systems Magzine, 4(1), 7–28. doi:10.1109/MCAS.2004.1286980.

    Article  Google Scholar 

  8. Wiegand, T., et al. (2003). Rate-constrained coder control and comparison of video coding standards. IEEE Transactions on Circuits and Systems for Video Technology, 13(7), 688–703. doi:10.1109/TCSVT.2003.815168 (CSVT).

    Article  Google Scholar 

  9. Bjontegaard, G. (2001). Calculation of average PSNR differences between RD-curves. ITU-T SG16 Doc. VCEG-M33.

  10. Ziauddin, S. M., ul-Haq, I., Nadeem, M., & Shafique, M. Methods and systems for providing low cost robust operational control for video encoders, Pub. Date: Sept. 6, 2007; Patent Pub. No. US-2007-0206674-A1, Class: 375240050 (USPTO).

  11. Yuan, W., Lin, S., Zhang, Y., Yuan, W., & Luo, H. (2006). Optimum bit allocation and rate control for H. 264/AVC. IEEE Transactions on Circuits and Systems for Video Technology, 16(6), 705–715. doi:10.1109/TCSVT.2006.875215 (CSVT).

    Article  Google Scholar 

  12. Milani, S., et al. (2003). A rate control algorithm for the H.264 encoder. Baiona Workshop on Signal Processing in Communications.

  13. Xtensa, L.X.: 2 processor, Tensilica Inc. Retrieved from http://www.tensilica.com.

  14. Xtensa, L.X.: 2 I/O Bandwidth. Retrieved from http://www.tensilica.com/products/io_bandwidth.htm.

  15. CoWare Inc: LISATek. Retrieved from http://www.coware.com/.

  16. Arctangent processor. Retrieved from http://www.arc.com/configurablecores/.

  17. Chen, T. C., Lian, C. J., & Chen, L. G. (2006). Hardware architecture design of an H.264/AVC video codec, Asia and South Pacific Conference on Design Automation (ASP-DAC), pp. 750–757.

  18. Reconfigurable Instruction Cell Array, U.K. Patent Application Number 0508589.9.

  19. Major, A., Yi, Y., Nousias, I., Milward, M., Khawam, S., & Arslan, T. (2006). H.264 Decoder implementation on a dynamically reconfigurable instruction cell based architecture. IEEE International SOC Conference, pp. 49–52.

  20. Lee, W. H., & Kim, J. H. (2006). “H.264 Implementation with Embedded Reconfigurable Architecture”, IEEE International Conference on Computer and Information Technology (CIT), pp. 247–251.

  21. The XPP team. (2002). The XPP White Paper, PACT Corporation, Release 2.1, pp. 1–4.

  22. May, F. (2004). “PACT XPP virtual platform based on AXYS maxSim 5.0”, PACT Corporation, Revision 0.3, pp. 12.

  23. Berekovic, M., Kanstein, A., Desmet, D., Bartic, A., Mei, B., & Mignolet, J. (2005). Mapping of video compression algorithms on the ADRES coarse-grain reconfigurable array. Workshop on Multimedia and Stream Processors, Barcelona, November 12.

  24. Veredas, F. J., Scheppler, M., Moffat, W., & Mei, B. (2005). Custom implementation of the coarse-grained reconfigurable ADRES Architecture for multimedia purposes. IEEE International Conference on Field Programmable Logic and Applications (FPL), pp. 106–111.

  25. Mei, B., Veredas, F. J., & Masschelein, B. (2005). Mapping an H.264/AVC decoder onto the ADRES reconfigurable architecture. IEEE International Conference on Field Programmable Logic and Applications (FPL), pp. 622–625.

  26. Martina, M., Masera, G., Fanucci, L., & Saponara, S. (2006). Hardware co-processors for real-time and high-quality H.264/avc video coding, 14th European Signal Processing Conference (EUSIPCO), pp. 200–204.

  27. Yang, L., et al. (2005). An effective variable block-size early termination algorithm for H.264 video coding. IEEE Transactions on Circuits and Systems for Video Technology, 15(6), 784–788. doi:10.1109/TCSVT.2005.848306 (CSVT).

    Article  Google Scholar 

  28. Lahti, J., et al. (2005). Algorithmic optimization of H.264/AVC encoder. IEEE International Symposium on Circuits and Systems (ISCAS), 4, 3463–3466.

    Article  Google Scholar 

  29. Kant, S., Mithun, U., & Gupta, P. (2006). Real time H.264 video encoder implementation on a programmable DSP processor for videophone applications. International Conference on Consumer Electronics (ICCE), pp. 93–94.

  30. Zhou, X., Yu, Z. H., & Yu, S. Y. (1998). Method for detecting all-zero DCT coefficients ahead of discrete cosine transform and quantization. Electronics Letters, 34(19), 1839–1840. doi:10.1049/el:19981308.

    Article  Google Scholar 

  31. Yang, J. F., Chang, S. H., & Chen, C. Y. (2002). Computation reduction for motion search in low rate video coders. IEEE Transactions on Circuits and Systems for Video Technology, 12(10), 948–951. doi:10.1109/TCSVT.2002.804892 (CSVT).

    Article  Google Scholar 

  32. Yu, A., Lee, R., & Flynn, M. (1997). Performance enhancement of H.263 encoder based on zero coefficient prediction. ACM International Conference on Multimedia, pp. 21–29.

  33. Suh, K. B., Park, S. M., & Cho, H. J. (2005). An efficient hardware architecture of intra prediction and TQ/IQIT module for H.264 encoder. ETRI Journal, 27(5), 511–524.

    Article  Google Scholar 

  34. Agostini, L., et al. (2006). High throughput architecture for H.264/AVC forward transforms block. ACM Great Lakes symposium on VLSI (GLSVLSI), pp. 320–323.

  35. Luczak, A., & Garstecki, P. (2005). A flexible architecture for image reconstruction in H.264/AVC decoders (vol. 1). European Conference Circuit Theory and Design, pp. I/217–I/220.

  36. Deng, L., Gao, W., Hu, M. Z., & Ji, Z. Z. (2005). An efficient hardware implementation for motion estimation of AVC standard. IEEE Transactions on Consumer Electronics, 51(4), 1360–1366. doi:10.1109/TCE.2005.1561868.

    Article  Google Scholar 

  37. Yap, S. Y., et al. (2005). A fast VLSI architecture for full-search variable block size motion estimation in MPEG-4 AVC/H.264. Asia and South Pacific Conference on Design Automation (ASP-DAC), pp. 631–634.

  38. Ou, C.-M., Le, C.-F., & Hwang, W.-J. (2005). An efficient VLSI architecture for H.264 variable block size motion estimation. IEEE Transactions on Consumer Electronics, 51(4), 1291–1299. doi:10.1109/TCE.2005.1561858.

    Article  Google Scholar 

  39. Suh, J. W., & Jeong, J. (2004). Fast sub-pixel motion estimation techniques having lower computational complexity. IEEE Transactions on Consumer Electronics, 50(3), 968–973. doi:10.1109/TCE.2004.1341708.

    Article  Google Scholar 

  40. Min, K. Y., & Chong, J. W. (2007). A memory and performance optimized architecture of deblocking filter in H.264/AVC. International Conference on Multimedia and Ubiquitous Engineering (MUE), pp. 220–225.

  41. Shih, S. Y., Chang, C. R., & Lin, Y. L. (2006). A near optimal deblocking filter for H.264 advanced video coding. Asia and South Pacific Conference on Design Automation (ASP-DAC), pp. 170–175.

  42. Parlak, M., & Hamzaoglu, I. (2006). An efficient hardware architecture for H.264 adaptive deblocking filter algorithm. First NASA/ESA Conference on Adaptive Hardware and Systems (AHS), pp. 381–385.

  43. Chen, C.-M., & Chen, C.-H. (2007). An efficient pipeline architecture for deblocking filter in H.264/AVC. IEICE Transactions on Information and Systems, E 90–D(1), 99–107.

  44. Arbelo, C., Kanstein, A., Lopez, S., Lopez, J. F., Berekovic, M., Sarmiento, R., et al. (2007). Mapping control-intensive video kernels onto a coarse-grain reconfigurable architecture: the H.264/AVC deblocking filter. Design, Automation, and Test in Europe (DATE), pp. 1–6.

  45. Hwang, H., Oh, T., Jung, H., & Ha, S. (2006). Conversion of reference C code to dataflow model H.264 encoder case study. Asia and South Pacific Conference on Design Automation (ASP-DAC), pp. 152–157.

  46. Lim, K. P., Wu, S., Wu, D. J., Rahardja, S., Lin, X., Pan, F., et al. (2003). Fast Inter Mode Selection, JVT-I020, 9th JVT Meeting, San Diego, United States, September.

  47. Hu, Y., Li, Q., Ma, S., & Kuo, C.-C.J. (2007). Fast H.264/AVC inter-mode decision with RDC optimization. International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP), pp. 511–516.

  48. Pan, F., Lin, X., Rahardja, S., Lim, K. P., Li, Z. G., Feng, G.N., Wu, D., & Wu, S. (2003). “Fast Mode Decision for Intra Prediction”, JVT-G013, 7th JVT Meeting, Pattaya, Thailand, March.

  49. Bauer, L., Shafique, M., Kramer, S., & Henkel, J. (2007). RISPP: rotating instruction set processing platform, 44th Design Automation Conference (DAC), pp. 791–796.

  50. Bauer, L., Shafique, M., Teufel, D., & Henkel, J. (2007). A self-adaptive extensible embedded processor. International Conference on Self-Adaptive and Self-Organizing Systems (SASO), pp. 344–347.

  51. Xiph.org Test Media. Retrieved from http://media.xiph.org/video/derf/.

  52. Vassiliadis, S., et al. (2004). The MOLEN polymorphic processor. IEEE Transactions on Computers, 53(11), 1363–1375. doi:10.1109/TC.2004.104.

    Article  Google Scholar 

  53. Vassiliadis, S., & Soudris, D. (2007). Fine- and coarse-grain reconfigurable computing. Berlin: Springer.

    Book  Google Scholar 

  54. Henkel, J. (2003). Closing the SoC design gap. IEEE Computer, 36(9), 119–121 (September).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Muhammad Shafique.

Additional information

This paper is an extended version of our ESTIMedia’07 paper. We have significantly extended (more than 50%) our ESTIMedia’07 paper by adding (a) detailed discussions of the proposed optimizations and a detailed diagram of the final optimized application structure, (b) a new section presenting a comprehensive data flow diagram and data structure formats with a memory-related discussion, (c) Special Instruction for De-blocking Filter, (d) extending the presented results with new figures and tables, (e) new section describing the optimization steps to create the Benchmark Application, (f) A new sub-section with Functional Description of all Special Instructions with constituting data paths, and (g) an extended overview of different hardware platforms used for benchmarking.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shafique, M., Bauer, L. & Henkel, J. Optimizing the H.264/AVC Video Encoder Application Structure for Reconfigurable and Application-Specific Platforms. J Sign Process Syst 60, 183–210 (2010). https://doi.org/10.1007/s11265-008-0304-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-008-0304-5

Keywords

Navigation