research-article

Practical speculative parallelization of variable-length decompression algorithms

Authors:
Hakbeom Jang

Sungkyunkwan University, Suwon, South Korea

Sungkyunkwan University, Suwon, South Korea
View Profile

,
Channoh Kim

Sungkyunkwan University, Suwon, South Korea

Sungkyunkwan University, Suwon, South Korea
View Profile

,
Jae W. Lee

Sungkyunkwan University, Suwon, South Korea

Sungkyunkwan University, Suwon, South Korea
View Profile

LCTES '13: Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systemsJune 2013Pages 55–64https://doi.org/10.1145/2465554.2465557

Published:25 October 2018Publication History

LCTES '13: Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems

Pages 55–64

ABSTRACT

Variable-length coding is widely used for efficient data compression. Typically, the compressor splits the original data into blocks and compresses each block with variable-length codes, hence producing variable-length compressed blocks. Although the compressor can easily exploit ample block-level parallelism, it is much more difficult to extract such coarse-grain parallelism from the decompressor because a block boundary cannot be located until decompression of the previous block is completed. This paper presents novel algorithms to efficiently predict block boundaries and a runtime system that enables efficient block-level parallel decompression, called SDM. The SDM execution model features speculative pipelining with three stages: Scanner, Decompressor, and Merger. The scanner stage employs a high-confidence prediction algorithm that finds compressed block boundaries without fully decompressing individual blocks. This information is communicated to the parallel decompressor stage in which multiple blocks are decompressed in parallel. The decompressed blocks are merged in order by the merger stage to produce the final output. The SDM runtime is specialized to execute this pipeline correctly and efficiently on resource-constrained embedded platforms. With SDM we effectively parallelize three production-grade variable-length decompression algorithms?zlib, bzip2, and H.264?with maximum speedups of 2.50× and 8.53× (and geometric mean speedups of 1.96× and 4.04×) on 4-core and 36-core embedded platforms, respectively.

References

bzip2 and libbzip. http://bzip2.org/.Google Scholar
gzip homepage. http://www.gzip.org/.Google Scholar
H.264: Advanced video coding for generic audiovisual services. http://www.itu.int/rec/T-REC-H.264/.Google Scholar
JPEG homepage. http://www.jpeg.org/jpeg/.Google Scholar
The Linux Information Project. http://linfo.org/.Google Scholar
Mozilla Developer Network. https://developer.mozilla.org/.Google Scholar
Parallel bzip2. http://compression.ca/pbzip2/.Google Scholar
A parallel implementation of gzip. http://zlib.net/pigz/}.Google Scholar
Portable Network Graphics. http://www.libpng.org/pub/png/.Google Scholar
Samsung Exynos 4 Quad. http://www.samsung.com/exynos/.Google Scholar
The Linux Kernel Archives. http://www.kernel.org/.Google Scholar
Tilera TILE-Gx processor family. http://www.tilera.com/.Google Scholar
Vorbis audio compression. http://xiph.org/vorbis/.Google Scholar
YUV CIF reference videos. http://trace.eas.asu.edu/yuv/}.Google Scholar
zlib: A massively spiffy yet delicately unobtrusive compression library. http://zlib.net/.Google Scholar
A. Bilas, J. Fritts, and J. P. Singh. Real-time parallel MPEG-2 decoding in software. In Proc. of IPPS, 1997. Google ScholarDigital Library
M. T. Biskup. Guaranteed synchronization of Huffman codes. In Proc. of Data Compression Conference (DCC), 2008. Google ScholarDigital Library
A. Gurhanli, C. C.-P. Chen, and S.-H. Hung. Coarse grain parallelization of H.264 video decoder and memory bottleneck in multi-core architectures. International Journal of Computer Theory and Engineering, 2011.Google Scholar
S. T. Klein and Y. Wiseman. Parallel Huffman decoding with applications to JPEG files. Computer Journal, 2003.Google ScholarCross Ref
P. P. C. Lee, T. Bu, and G. Chandranmenon. A lock-free, cache-efficient multi-core synchronization mechanism for line-rate network traffic monitoring. In Proc. of IPDPS, 2010.Google ScholarCross Ref
W. Liu, J. Tuck, L. Ceze, W. Ahn, K. Strauss, J. Renau, and J. Torrellas. POSH: a TLS compiler that exploits program structure. In Proc. of PPoPP, 2006. Google ScholarDigital Library
J. Mankin, D. Kaeli, and J. Ardini. Software transactional memory for multicore embedded systems. In Proc. of LCTES, 2009. Google ScholarDigital Library
P. Marcuello, J. Tubella, and A. Gonzalez. Value prediction for speculative multithreaded architectures. In Proc. of ISCA, 1999.Google ScholarCross Ref
J. Nikara, S. Vassiliadis, J. Takala, M. Sima, and P. Liuha. Parallel multiple-symbol variable-length decoding. In Proc. of ICCD, 2002. Google ScholarDigital Library
A. Raman, H. Kim, T. R. Mason, T. B. Jablin, and D. I. August. Speculative parallelization using software multi-threaded transactions. In Proc. of ASPLOS, 2010. Google ScholarDigital Library
E. Raman, N. Vachharajani, R. Rangan, and D. I. August. Spice: speculative parallel iteration chunk execution. In Proc. of CGO, 2008. Google ScholarDigital Library
Standard Performance Evaluation Corporation. http://www.spec.org/.Google Scholar
J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry. Improving value communication for thread-level speculation. In HPCA, 2002. Google ScholarDigital Library
C. Tian, M. Feng, and R. Gupta. Speculative parallelization using state separation and multiple value prediction. In Proc. of ISMM, 2010. Google ScholarDigital Library
C. Tian, M. Feng, V. Nagarajan, and R. Gupta. Copy or discard execution model for speculative parallelization on multicores. In Proc. of MICRO, 2008. Google ScholarDigital Library
Z. Zhao, B. Wu, and X. She. Speculative parallelization needs rigor: Probabilistic analysis for optimal speculation of finite state machine applications. In Proc. of PACT, 2012. Google ScholarDigital Library
C. Zilles and G. Sohi. Master/slave speculative parallelization. In Proc. of MICRO, 2002. Google ScholarDigital Library
J. Ziv and A. Lempel. A universal algorithm for sequential data compression. IEEE Trans. Inf. Theor., 23(3):337--343, Sept. 2006. Google ScholarDigital Library

Index Terms

Practical speculative parallelization of variable-length decompression algorithms
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Runtime environments
    2. General programming languages
      1. Language types
        Parallel programming languages

Recommendations

Practical speculative parallelization of variable-length decompression algorithms
LCTES '13: Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems

Variable-length coding is widely used for efficient data compression. Typically, the compressor splits the original data into blocks and compresses each block with variable-length codes, hence producing variable-length compressed blocks. Although the ...
Read More
Practical speculative parallelization of variable-length decompression algorithms
LCTES '13

Variable-length coding is widely used for efficient data compression. Typically, the compressor splits the original data into blocks and compresses each block with variable-length codes, hence producing variable-length compressed blocks. Although the ...
Read More
FPGA bitstream compression and decompression using LZ and golomb coding (abstract only)
FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

In this paper we propose an optimized bitstream compression algorithm based on LZ and a novel architecture of decompressor, the proposed algorithm improves the Compression Ratio by fully utilizing the regularity of configuration bits of CLB (...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
LCTES '13: Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
June 2013
184 pages
ISBN:9781450320856
DOI:10.1145/2491899
General Chair:
Björn Franke
University of Edinburgh, UK
,
Program Chair:
Jingling Xue
University of New South Wales, Australia
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 October 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
compression
embedded systems
multicores
parallelization
runtime
speculation
Qualifiers
- research-article
Conference

Acceptance Rates
LCTES '13 Paper Acceptance Rate16of60submissions,27%Overall Acceptance Rate116of438submissions,26%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 9
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Practical speculative parallelization of variable-length decompression algorithms

LCTES '13: Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Practical speculative parallelization of variable-length decompression algorithms

Practical speculative parallelization of variable-length decompression algorithms

FPGA bitstream compression and decompression using LZ and golomb coding (abstract only)