Skip to main content
Log in

A parallel memory architecture for video coding

  • Published:
Journal of Zhejiang University-SCIENCE A Aims and scope Submit manuscript

Abstract

To efficiently exploit the performance of single instruction multiple data (SIMD) architectures for video coding, a parallel memory architecture with power-of-two memory modules is proposed. It employs two novel skewing schemes to provide conflict-free access to adjacent elements (8-bit and 16-bit data types) or with power-of-two intervals in both horizontal and vertical directions, which were not possible in previous parallel memory architectures. Area consumptions and delay estimations are given respectively with 4, 8 and 16 memory modules. Under a 0.18-μm CMOS technology, the synthesis results show that the proposed system can achieve 230 MHz clock frequency with 16 memory modules at the cost of 19k gates when read and write latencies are 3 and 2 clock cycles, respectively. We implement the proposed parallel memory architecture on a video signal processor (VSP). The results show that VSP enhanced with the proposed architecture achieves 1.28× speedups for H.264 real-time decoding.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aho, E., Vanne, J., Kuusilinna, K., Hamalainen, T.D., 2004. Address computation in configurable parallel memory architecture. IEICE Trans. on Inf. Syst., 87(7):1674–1681.

    Google Scholar 

  • Aho, E., Vanne, J., HÄmÄlÄinen, T.D., 2007. Configurable data memory for multimedia processing. J. VLSI Signal Processing, 50(2):231–249. [doi:10.1007/s11265-007-0126-x]

    Google Scholar 

  • Budnik, P., Kuck, D.J., 1971. The organization and use of parallel memories. IEEE Trans. on Comput., C-20(12): 1566–1569. [doi:10.1109/T-C.1971.223171]

    Article  MATH  Google Scholar 

  • Cheresiz, D., Juurlink, B., Vassiliadis, S., Wijshoff, H.A.G., 2005. The CSI multimedia architecture. IEEE Trans. on VLSI Syst., 13(1):1–13. [doi:10.1109/TVLSI.2004.840415]

    Article  Google Scholar 

  • Corbal, J., Valero, M., Espasa, R., 1999. Exploiting a New Level of DLP in Multimedia Applications. Proc. Int. Symp. on Microarchitecture, p.72–79. [doi:10.1109/MICRO.1999.809445]

  • Deb, A., 1996. Multiskewing—a novel technique for optimal parallel memory access. IEEE Trans. on Parall. Distrib. Syst., 7(6):595–604. [doi:10.1109/71.506698]

    Article  Google Scholar 

  • Frailong, J.M., Jalby, W., Lenfant, J., 1985. XOR-Schemes: A Flexible Data Organization in Parallel Memories. Proc. Int. Conf. on Parallel Processing, p.276–283.

  • Gossel, M., Rebel, B., Creutzburg, R., 1994. Memory Architecture & Parallel Access. Elsevier Science Inc., New York, USA, p.250.

    MATH  Google Scholar 

  • Khailany, B., Dally, W.J., Kapasi, U.J., Mattson, P., Namkoong, J., Owens, J.D., Towles, B., Chang, A., Rixner, S., 2001. Imagine: media processing with streams. IEEE Micro., 21(2):35–46. [doi:10.1109/40.918001]

    Article  Google Scholar 

  • Kozyrakis, C., Patterson, D., 2002. Vector vs. Superscalar and VLIW Architectures for Embedded Multimedia Benchmarks. Proc. Int. Symp. on Microarchitecture, p.283–293. [doi:10.1109/MICRO.2002.1176257]

  • Kozyrakis, C.E., Patterson, D.A., 2003. Scalable vector processors for embedded systems. IEEE Micro., 23(6):36–45. [doi:10.1109/MM.2003.1261385]

    Article  Google Scholar 

  • Lee, R.B., 2000. Subword Permutation Instructions for Two-dimensional Multimedia Processing in MicroSIMD Architectures. Proc. IEEE Int. Conf. on Application—Specific Systems, Architectures and Processors, p.3–14. [doi:10.1109/ASAP.2000.862373]

  • Li, L., Goto, S., Ikenaga, T., 2005. An Efficient Deblocking Filter Architecture with 2-Dimensional Parallel Memory for H.264/AVC. Proc. Asia and South Pacific Design Automation Conf., p.623–626. [doi:10.1145/1120725.1120978]

  • Liu, K.J., Qin, X., Yan, X.L., Quan, Li, 2006. A SIMD Video Signal Processor with Efficient Data Organization. Proc. IEEE Asia Solid-State Circuits Conf., p.115–118. [doi:10.1109/ASSCC.2006.357865]

  • Park, J.K., 2004. Multiaccess memory system for attached SIMD computer. IEEE Trans. on Comput., 53(4):439–452. [doi:10.1109/TC.2004.1268401]

    Article  Google Scholar 

  • Sohi, G.S., 1993. High-bandwidth interleaved memories for vector processors—a simulation study. IEEE Trans. on Comput., 42(1):34–44. [doi:10.1109/12.192212]

    Article  Google Scholar 

  • Talla, D., John, L.K., Burger, D., 2003. Bottlenecks in multimedia processing with SIMD style extensions and architectural enhancements. IEEE Trans. on Comput., 52(8):1015–1031. [doi:10.1109/TC.2003.1223637]

    Article  Google Scholar 

  • Tanskanen, J., Sihvo, T., Niittylahti, J., Takala, J., Creutzburg, R., 2000. Parallel Memory Access Schemes for H.263 Encoder. Proc. IEEE Int. Symp. on Circuits and Systems, p.691–694. [doi:10.1109/ISCAS.2000.857189]

  • Tanskanen, J.K., Sihvo, T., Niittylahti, J.T., 2004. Byte and modulo addressable parallel memory architecture for video coding. IEEE Trans. on Circuits Syst. Video Technol., 14(11):1270–1276. [doi:10.1109/TCSVT.2004.835148]

    Article  Google Scholar 

  • Tanskanen, J.K., Creutzburg, R., Niittylahti, J.T., 2005. On design of parallel memory access schemes for video coding. J. VLSI Signal Processing, 40(2):215–237. [doi:10.1007/s11265-005-4962-2]

    Article  Google Scholar 

  • Trenas, M.A., Opez, J., Arguello, F., Zapata, E.L., 1998. A Memory System Supporting the Efficient SIMD Computation of the Two Dimensional DWT. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, p.1521–1524.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiao-lang Yan.

Additional information

Project (No. 2005AA1Z1271) supported by the Hi-Tech Research and Development Program (863) of China

Rights and permissions

Reprints and permissions

About this article

Cite this article

Peng, Jy., Yan, Xl., Li, Dx. et al. A parallel memory architecture for video coding. J. Zhejiang Univ. Sci. A 9, 1644–1655 (2008). https://doi.org/10.1631/jzus.A0820052

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/jzus.A0820052

Key words

CLC number

Navigation