Abstract
To efficiently exploit the performance of single instruction multiple data (SIMD) architectures for video coding, a parallel memory architecture with power-of-two memory modules is proposed. It employs two novel skewing schemes to provide conflict-free access to adjacent elements (8-bit and 16-bit data types) or with power-of-two intervals in both horizontal and vertical directions, which were not possible in previous parallel memory architectures. Area consumptions and delay estimations are given respectively with 4, 8 and 16 memory modules. Under a 0.18-μm CMOS technology, the synthesis results show that the proposed system can achieve 230 MHz clock frequency with 16 memory modules at the cost of 19k gates when read and write latencies are 3 and 2 clock cycles, respectively. We implement the proposed parallel memory architecture on a video signal processor (VSP). The results show that VSP enhanced with the proposed architecture achieves 1.28× speedups for H.264 real-time decoding.
Similar content being viewed by others
References
Aho, E., Vanne, J., Kuusilinna, K., Hamalainen, T.D., 2004. Address computation in configurable parallel memory architecture. IEICE Trans. on Inf. Syst., 87(7):1674–1681.
Aho, E., Vanne, J., HÄmÄlÄinen, T.D., 2007. Configurable data memory for multimedia processing. J. VLSI Signal Processing, 50(2):231–249. [doi:10.1007/s11265-007-0126-x]
Budnik, P., Kuck, D.J., 1971. The organization and use of parallel memories. IEEE Trans. on Comput., C-20(12): 1566–1569. [doi:10.1109/T-C.1971.223171]
Cheresiz, D., Juurlink, B., Vassiliadis, S., Wijshoff, H.A.G., 2005. The CSI multimedia architecture. IEEE Trans. on VLSI Syst., 13(1):1–13. [doi:10.1109/TVLSI.2004.840415]
Corbal, J., Valero, M., Espasa, R., 1999. Exploiting a New Level of DLP in Multimedia Applications. Proc. Int. Symp. on Microarchitecture, p.72–79. [doi:10.1109/MICRO.1999.809445]
Deb, A., 1996. Multiskewing—a novel technique for optimal parallel memory access. IEEE Trans. on Parall. Distrib. Syst., 7(6):595–604. [doi:10.1109/71.506698]
Frailong, J.M., Jalby, W., Lenfant, J., 1985. XOR-Schemes: A Flexible Data Organization in Parallel Memories. Proc. Int. Conf. on Parallel Processing, p.276–283.
Gossel, M., Rebel, B., Creutzburg, R., 1994. Memory Architecture & Parallel Access. Elsevier Science Inc., New York, USA, p.250.
Khailany, B., Dally, W.J., Kapasi, U.J., Mattson, P., Namkoong, J., Owens, J.D., Towles, B., Chang, A., Rixner, S., 2001. Imagine: media processing with streams. IEEE Micro., 21(2):35–46. [doi:10.1109/40.918001]
Kozyrakis, C., Patterson, D., 2002. Vector vs. Superscalar and VLIW Architectures for Embedded Multimedia Benchmarks. Proc. Int. Symp. on Microarchitecture, p.283–293. [doi:10.1109/MICRO.2002.1176257]
Kozyrakis, C.E., Patterson, D.A., 2003. Scalable vector processors for embedded systems. IEEE Micro., 23(6):36–45. [doi:10.1109/MM.2003.1261385]
Lee, R.B., 2000. Subword Permutation Instructions for Two-dimensional Multimedia Processing in MicroSIMD Architectures. Proc. IEEE Int. Conf. on Application—Specific Systems, Architectures and Processors, p.3–14. [doi:10.1109/ASAP.2000.862373]
Li, L., Goto, S., Ikenaga, T., 2005. An Efficient Deblocking Filter Architecture with 2-Dimensional Parallel Memory for H.264/AVC. Proc. Asia and South Pacific Design Automation Conf., p.623–626. [doi:10.1145/1120725.1120978]
Liu, K.J., Qin, X., Yan, X.L., Quan, Li, 2006. A SIMD Video Signal Processor with Efficient Data Organization. Proc. IEEE Asia Solid-State Circuits Conf., p.115–118. [doi:10.1109/ASSCC.2006.357865]
Park, J.K., 2004. Multiaccess memory system for attached SIMD computer. IEEE Trans. on Comput., 53(4):439–452. [doi:10.1109/TC.2004.1268401]
Sohi, G.S., 1993. High-bandwidth interleaved memories for vector processors—a simulation study. IEEE Trans. on Comput., 42(1):34–44. [doi:10.1109/12.192212]
Talla, D., John, L.K., Burger, D., 2003. Bottlenecks in multimedia processing with SIMD style extensions and architectural enhancements. IEEE Trans. on Comput., 52(8):1015–1031. [doi:10.1109/TC.2003.1223637]
Tanskanen, J., Sihvo, T., Niittylahti, J., Takala, J., Creutzburg, R., 2000. Parallel Memory Access Schemes for H.263 Encoder. Proc. IEEE Int. Symp. on Circuits and Systems, p.691–694. [doi:10.1109/ISCAS.2000.857189]
Tanskanen, J.K., Sihvo, T., Niittylahti, J.T., 2004. Byte and modulo addressable parallel memory architecture for video coding. IEEE Trans. on Circuits Syst. Video Technol., 14(11):1270–1276. [doi:10.1109/TCSVT.2004.835148]
Tanskanen, J.K., Creutzburg, R., Niittylahti, J.T., 2005. On design of parallel memory access schemes for video coding. J. VLSI Signal Processing, 40(2):215–237. [doi:10.1007/s11265-005-4962-2]
Trenas, M.A., Opez, J., Arguello, F., Zapata, E.L., 1998. A Memory System Supporting the Efficient SIMD Computation of the Two Dimensional DWT. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, p.1521–1524.
Author information
Authors and Affiliations
Corresponding author
Additional information
Project (No. 2005AA1Z1271) supported by the Hi-Tech Research and Development Program (863) of China
Rights and permissions
About this article
Cite this article
Peng, Jy., Yan, Xl., Li, Dx. et al. A parallel memory architecture for video coding. J. Zhejiang Univ. Sci. A 9, 1644–1655 (2008). https://doi.org/10.1631/jzus.A0820052
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/jzus.A0820052