skip to main content
10.1145/2465554.2465557acmconferencesArticle/Chapter ViewAbstractPublication PagescpsweekConference Proceedingsconference-collections
research-article

Practical speculative parallelization of variable-length decompression algorithms

Published:25 October 2018Publication History

ABSTRACT

Variable-length coding is widely used for efficient data compression. Typically, the compressor splits the original data into blocks and compresses each block with variable-length codes, hence producing variable-length compressed blocks. Although the compressor can easily exploit ample block-level parallelism, it is much more difficult to extract such coarse-grain parallelism from the decompressor because a block boundary cannot be located until decompression of the previous block is completed. This paper presents novel algorithms to efficiently predict block boundaries and a runtime system that enables efficient block-level parallel decompression, called SDM. The SDM execution model features speculative pipelining with three stages: Scanner, Decompressor, and Merger. The scanner stage employs a high-confidence prediction algorithm that finds compressed block boundaries without fully decompressing individual blocks. This information is communicated to the parallel decompressor stage in which multiple blocks are decompressed in parallel. The decompressed blocks are merged in order by the merger stage to produce the final output. The SDM runtime is specialized to execute this pipeline correctly and efficiently on resource-constrained embedded platforms. With SDM we effectively parallelize three production-grade variable-length decompression algorithms?zlib, bzip2, and H.264?with maximum speedups of 2.50× and 8.53× (and geometric mean speedups of 1.96× and 4.04×) on 4-core and 36-core embedded platforms, respectively.

References

  1. bzip2 and libbzip. http://bzip2.org/.Google ScholarGoogle Scholar
  2. gzip homepage. http://www.gzip.org/.Google ScholarGoogle Scholar
  3. H.264: Advanced video coding for generic audiovisual services. http://www.itu.int/rec/T-REC-H.264/.Google ScholarGoogle Scholar
  4. JPEG homepage. http://www.jpeg.org/jpeg/.Google ScholarGoogle Scholar
  5. The Linux Information Project. http://linfo.org/.Google ScholarGoogle Scholar
  6. Mozilla Developer Network. https://developer.mozilla.org/.Google ScholarGoogle Scholar
  7. Parallel bzip2. http://compression.ca/pbzip2/.Google ScholarGoogle Scholar
  8. A parallel implementation of gzip. http://zlib.net/pigz/}.Google ScholarGoogle Scholar
  9. Portable Network Graphics. http://www.libpng.org/pub/png/.Google ScholarGoogle Scholar
  10. Samsung Exynos 4 Quad. http://www.samsung.com/exynos/.Google ScholarGoogle Scholar
  11. The Linux Kernel Archives. http://www.kernel.org/.Google ScholarGoogle Scholar
  12. Tilera TILE-Gx processor family. http://www.tilera.com/.Google ScholarGoogle Scholar
  13. Vorbis audio compression. http://xiph.org/vorbis/.Google ScholarGoogle Scholar
  14. YUV CIF reference videos. http://trace.eas.asu.edu/yuv/}.Google ScholarGoogle Scholar
  15. zlib: A massively spiffy yet delicately unobtrusive compression library. http://zlib.net/.Google ScholarGoogle Scholar
  16. A. Bilas, J. Fritts, and J. P. Singh. Real-time parallel MPEG-2 decoding in software. In Proc. of IPPS, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. T. Biskup. Guaranteed synchronization of Huffman codes. In Proc. of Data Compression Conference (DCC), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Gurhanli, C. C.-P. Chen, and S.-H. Hung. Coarse grain parallelization of H.264 video decoder and memory bottleneck in multi-core architectures. International Journal of Computer Theory and Engineering, 2011.Google ScholarGoogle Scholar
  19. S. T. Klein and Y. Wiseman. Parallel Huffman decoding with applications to JPEG files. Computer Journal, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  20. P. P. C. Lee, T. Bu, and G. Chandranmenon. A lock-free, cache-efficient multi-core synchronization mechanism for line-rate network traffic monitoring. In Proc. of IPDPS, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  21. W. Liu, J. Tuck, L. Ceze, W. Ahn, K. Strauss, J. Renau, and J. Torrellas. POSH: a TLS compiler that exploits program structure. In Proc. of PPoPP, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Mankin, D. Kaeli, and J. Ardini. Software transactional memory for multicore embedded systems. In Proc. of LCTES, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. Marcuello, J. Tubella, and A. Gonzalez. Value prediction for speculative multithreaded architectures. In Proc. of ISCA, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  24. J. Nikara, S. Vassiliadis, J. Takala, M. Sima, and P. Liuha. Parallel multiple-symbol variable-length decoding. In Proc. of ICCD, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Raman, H. Kim, T. R. Mason, T. B. Jablin, and D. I. August. Speculative parallelization using software multi-threaded transactions. In Proc. of ASPLOS, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. E. Raman, N. Vachharajani, R. Rangan, and D. I. August. Spice: speculative parallel iteration chunk execution. In Proc. of CGO, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Standard Performance Evaluation Corporation. http://www.spec.org/.Google ScholarGoogle Scholar
  28. J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry. Improving value communication for thread-level speculation. In HPCA, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. C. Tian, M. Feng, and R. Gupta. Speculative parallelization using state separation and multiple value prediction. In Proc. of ISMM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. C. Tian, M. Feng, V. Nagarajan, and R. Gupta. Copy or discard execution model for speculative parallelization on multicores. In Proc. of MICRO, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Z. Zhao, B. Wu, and X. She. Speculative parallelization needs rigor: Probabilistic analysis for optimal speculation of finite state machine applications. In Proc. of PACT, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. C. Zilles and G. Sohi. Master/slave speculative parallelization. In Proc. of MICRO, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Ziv and A. Lempel. A universal algorithm for sequential data compression. IEEE Trans. Inf. Theor., 23(3):337--343, Sept. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Practical speculative parallelization of variable-length decompression algorithms

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          LCTES '13: Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
          June 2013
          184 pages
          ISBN:9781450320856
          DOI:10.1145/2491899

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 25 October 2018

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          LCTES '13 Paper Acceptance Rate16of60submissions,27%Overall Acceptance Rate116of438submissions,26%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader