Skip to main content

Extracting More Parallelism: the 3D-Wave

  • Chapter
  • First Online:
Scalable Parallel Programming Applied to H.264/AVC Decoding

Abstract

If higher performance is required, a parallel application developer might have to extract more parallelism than initially employed in the application. To illustrate this step, this chapter presents a parallel implementation of H.264 decoding on a shared-memory system that scales to a large number of cores. The application implements the dynamic 3D-Wave algorithm, which exploits intra-frame MB-level parallelism as well as inter-frame MB-level parallelism. The 3D-Wave algorithm is based on the observation that inter-frame dependencies have a limited spatial range, i.e., that motion vectors are typically short. Experimental results obtained using a simulator of a many-core architecture containing NXP TriMedia TM3270 embedded processors show that the implementation scales very well, achieving a speedup of more than 50 on a 64-core processor for a 25-frame FHD sequence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alvarez, M., Salami, E., Ramirez, A., Valero, M.: HD-VideoBench: A Benchmark for Evaluating High Definition Digital Video Applications. In: IEEE International Symposium on Workload Characterization (2007). http://personals.ac.upc.edu/alvarez/hdvideobench/index.html

  2. Azevedo, A., Juurlink, B., Meenderinck, C., Terechko, A., Hoogerbrugge, J., Alvarez, M., Ramirez, A., Valero, M.: A highly scalable parallel implementation of h.264. Transactions on High-Performance Embedded Architectures and Compilers 4(2) (2009)

    Google Scholar 

  3. Hoogerbrugge J, Terechko A (2011) A Multithreaded Multicore System for Embedded Media Processing. Transactions on High-Performance Embedded Architectures and Compilers 6590:154–173

    Google Scholar 

  4. Taubenfeld, G.: Synchronization Algorithms and Concurrent Programming. Prentice Hall (2006)

    Google Scholar 

  5. van deWaerdt, J., Vassiliadis, S., Das, S., Mirolo, S., Yen, C., Zhong, B., Basto, C., van Itegem, J., Amirtharaj, D., Kalra, K., et al.: The TM3270 Media-Processor. In: Proceedings of the 38th International Symposium on Microarchitecture, pp. 331–342 (2005)

    Google Scholar 

  6. x264. A Free H.264/AVC Encoder. http://developers.videolan.org/x264.html

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ben Juurlink .

Rights and permissions

Reprints and permissions

Copyright information

© 2012 The Author(s)

About this chapter

Cite this chapter

Juurlink, B., Alvarez-Mesa, M., Chi, C.C., Azevedo, A., Meenderinck, C., Ramirez, A. (2012). Extracting More Parallelism: the 3D-Wave. In: Scalable Parallel Programming Applied to H.264/AVC Decoding. SpringerBriefs in Computer Science. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-2230-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-2230-3_5

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-2229-7

  • Online ISBN: 978-1-4614-2230-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics