Skip to main content
Log in

Guidance of Loop Ordering for Reduced Memory Usage in Signal Processing Applications

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Data dominated signal processing applications are typically described using large and multi-dimensional arrays and loop nests. The order of production and consumption of array elements in these loop nests has huge impact on the amount of memory required during execution. This is essential since the size and complexity of the memory hierarchy is the dominating factor for power, performance and chip size in these applications. This paper presents a number of guiding principles for the ordering of the dimensions in the loop nests. They enable the designer, or design tools, to find the optimal ordering of loop nest dimensions for individual data dependencies in the code. We prove the validity of the guiding principles when no prior restrictions are given regarding fixation of dimensions. If some dimensions are already fixed at given nest levels, this is taken into account when fixing the remaining dimensions. In most cases an optimal ordering is found for this situation as well. The guiding principles can be used in the early design phases in order to enable minimization of the memory requirement through in-place mapping. We use real life examples to show how they can be applied to reach a cost optimized end product. The results show orders of magnitude improvement in memory requirement compared to using the declared array sizes, and similar penalties for choosing the suboptimal ordering of loops when in-place mapping is exploited.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11

Similar content being viewed by others

Abbreviations

DP:

dependency part

LR:

length ratio

DV:

dependency vector

ND:

nonspanning dimension

DVP:

dependency vector polytope

SD:

spanning dimension

ID:

iteration domain

UB:

upper bound

LB:

lower bound

References

  1. Catthoor, F., Wuytack, S., De Greef, E., Balasa, F., Nachtergaele, L., & Vandecappelle, A. (1998). Custom memory management methodology—Exploration of memory organisation for embedded multimedia system design. Boston, USA: Kluwer.

    MATH  Google Scholar 

  2. Catthoor, F., Danckaert, K., Kulkarni, K. K., Brockmeyer, E., Kjeldsberg, P. G., van Achteren, T., et al. (2002). Data access and storage management for embedded programmable processors. Boston, USA: Kluwer.

    MATH  Google Scholar 

  3. Banerjee, U. (1988). Dependence analysis for supercomputing. Boston, USA: Kluwer.

    Google Scholar 

  4. Allen, J. R., & Kennedy, K. (1984). Automatic loop inter change. Proc. of the SIGPLAN’84 symposium on compiler construction, SIGPLAN Notices (Vol. 19, pp. 233–246) (June).

    Article  Google Scholar 

  5. Pugh, W., & Wonnacott, D. (1993). An exact method for analysis of value-based array data dependences. In Proc. 6th intnl. wsh. on languages and compilers for parallel computing. Portland OR, USA, (pp. 546–566) (August).

  6. Vanbroekhoven, P., Janssens, G., Bruynooghe, M., Corporaal, H., & Catthoor, F. (2005). Transformation to dynamic single assignment using a simple data flow analysis. In Proc. 3rd Asian symp. on programming languages and systems, APLAS’05, (Tsukuba, Japan), vol. 3780 of Lecture Notes Comp. Sc., Springer Verlag (pp. 330–346) (November).

  7. Palkovic, M., Brockmeyer, E., Vanbroekhoven, P., Corporaal, H., & Catthoor, F. (2005). Systematic preprocessing of data dependent constructs for embedded systems. In Proc. 15th intnl. wsh. on integrated circuit and system design, power and timing modeling, optimization and simulation (PATMOS), IEEE. Leuven, Belgium (pp. 89–98) (September).

  8. Verbauwhede, I., Catthoor, F., Vandewalle, J., & De Man, H. (1989). Background memory management for the synthesis of algebraic algorithms on multi-processor dsp chips. In Proc. VLSI’89, intnl. conf. on VLSI. Munich, Germany (pp. 209–218) (August).

  9. Wolf, M. E., & Lam, M. S. (1991). A loop transformation theory and an algorithm to maximize parallelism. IEEE Transactions on Parallel and Distributed Systems, 2, 452–471) (October).

    Article  Google Scholar 

  10. Kennedy, K., & McKinley, K. S. (1992). Optimizing for parallelism and data locality. In Proc. of the 6th international conference on supercomputing. Washington, DC, USA (pp. 323–334) (August).

  11. Clauss, P., & Loechner, V. (1998). Parametric analysis of polyhedral iteration spaces. Journal of VLSI Signal Processing, 19, 179–194 (July).

    Article  Google Scholar 

  12. Allen, R. & Kennedy, K. (2002). Optimizing compilers for modern architectures. San Francisco, USA: KMorgan Kaufmann.

    Google Scholar 

  13. McKinley, K. S., Carr, S., & Tseng, C.-W. (1996). Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems, 18, 424–453 (July).

    Article  Google Scholar 

  14. Danckaert, K., Catthoor, F., & De Man, H. (2000). A loop transformation approach for combined parallelization and data transfer and storage optimization. In Proc. ACM conf. on par. and dist. proc. techniques and applications, PDPTA’00. Las Vegas NV, USA (pp. 2591–2597) (June).

  15. Verdoolaege, S., Bruynooghe, M., Janssens, G., & Catthoor, F. (2003). Multi-dimensional incremental loop fusion for data locality. In Proc. IEEE international conference on application-specific systems, architectures, and processors, ASAP’03. Leiden, The Netherlands (pp. 17–27) (June).

  16. De Greef, E., Catthoor, F., & De Man, H. (1997). Array placement for storage size reduction in embedded multimedia systems. In Proc. intnl. conf. on applic.-spec. systems arch. and processors. Zurich, Switzerland (pp. 66–75) (July).

  17. Lefebvre, V., & Feautrier, P. (1997). Optimizing storage size for static control programs in automatic parallelizers. In Proc. EuroPar conf., vol. 1300 of Lecture notes in computer science. Springer Verlag, Passau, Germany (pp. 356–363) (August).

  18. Quillere, F., & Rajopadhye, S. (2000). Optimizing memory usage in the polyhedral model. ACM Transactions on Programming Languages and Systems, 22, 773–815 (September).

    Article  Google Scholar 

  19. Darte, A., Schreiber, R., & Villard, G. (2005). Lattice-based memory allocation. IEEE Transactions on Computers, 54, 1242–1257 (October).

    Article  Google Scholar 

  20. Chakrabarti, C. (2001). Cache design and exploration for low power embedded systems. In Proc. intnl. conf. on performance, computing, and communications, IEEE. Phoenix, Arizona, USA (pp. 135–139) (April).

  21. Kandemir, M., Ramanujam, J., Irwin, M. J., Vijaykrishnan, N., Kadayif, I., & Parikh, A. (2004). A compiler-based approach for dynamically managing scratch-pad memories in embedded systems. IEEE Transactions on Computer-Aided Design, 23, 243–260 (February).

    Article  Google Scholar 

  22. Kirovski, D., Lee, C., Potkonjak, M., & Mangione-Smith, W. H. (1999). Application-driven synthesis of memory-intensive systems-on-chip. IEEE Transactions on Computer-Aided Design, 18, 1316–1326 (September).

    Article  Google Scholar 

  23. Panda, P. R., Dutt, N. D., & Nicolau, A. (1999). Local memory exploration and optimization in embedded systems. IEEE Transactions on Computer-Aided Design, 18, 3–13 (January).

    Article  Google Scholar 

  24. Kurdahi, F., & Parker, A. (1987). Real: A program for register allocation. In Proc. 24th ACM/IEEE design automation conf. Miami FL, USA (pp. 210–215) (June).

  25. Ohm, S. Y., Kurdahi, F. J., & Dutt, N. (1994). Comprehensive lower bound estimation from behavioral description. In IEEE/ACM Intnl. Conf. on Computer-Aided Design, IEEE. San Jose CA, USA (pp. 182–187) (November).

  26. Paulin, P. G., & Knight, J. P. (1989). Force-directed scheduling for the behavioral synthesis of asics. IEEE Transactions on Computer-Aided Design, 8, 661–679 (June).

    Article  Google Scholar 

  27. Tseng, C.-J., & Siewiorek, D. (1986). Automated synthesis of data paths in digital systems. IEEE Transactions on Computer-Aided Design, 5, 379–395 (July).

    Article  Google Scholar 

  28. Gebotys, C. H., & Elmasry, M. I. (1991). Simultaneous scheduling and allocation for cost constrained optimal architectural synthesis. In Proc. of the 28th ACM/IEEE design automation conf. San Jose CA, USA (pp. 2–7) (November).

  29. Verbauwhede, I., Scheers, C., & Rabaey, J. (1994). Memory estimation for high-level synthesis. In Proc. 31st ACM/IEEE design automation conf. San Diego CA, USA (pp. 143–148) (June).

  30. Grun, P., Balasa, F., & Dutt, N. (1998). Memory size estimation for multimedia applications. In Proc. ACM/IEEE wsh. on hardware/software co-design (Codes). Seattle WA, USA (pp. 145–149) (March).

  31. Zhao, Y., & Malik, S. (1999). Exact memory size estimation for array computation without loop unrolling. In 36th ACM/IEEE design automation conf. New Orleans, USA (pp. 811–816) (June).

  32. Ramanujam, J., Hong, J., Kandemir, M., & Narayan, A. (2001). Reducing memory requirements of nested loops for embedded systems. In 38th ACM/IEEE design automation conf. Las Vegas NV, USA (pp. 359–364) (June).

  33. Balasa, F., Catthoor, F., & De Man, H. (1995). Background memory area estimation for multi-dimensional signal processing systems. IEEE Transactions on VLSI Systems, 3, 157–172 (June).

    Article  Google Scholar 

  34. Balasa, F., Zhu, H., & Luican, I. (2007). Computation of storage requirements for multi-dimensional signal processing applications. IEEE Transactions on VLSI Systems, 15, 447–460 (April).

    Article  Google Scholar 

  35. Hu, Q., Vandecappelle, A., Palkovic, M., Kjeldsberg, P. G., Brockmeyer, E., & Catthoor, F. (2006). Hierarchical memory size estimation for loop fusion and loop shifting in data-dominated applications. In Proc. of the 11th Asia and South Pacific design automation conference, ASP-DAC 2006. Yokohama, Japan (pp. 606–611) (January).

  36. Smailagic, A. (Guest ed.) (2001). Special issue on system level design. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 9 (December).

  37. Kjeldsberg, P. G., Catthoor, F., & Aas, E. J. (2003). Detection of partially simultaneously alive signals in storage requirement estimation for data-intensive applications. IEEE Transactions on Computer-Aided Design, 22, 908–921 (July).

    Article  Google Scholar 

  38. Kjeldsberg, P. G., Catthoor, F., & Aas, E. J. (2004). Storage requirement estimation for optimized design of data intensive applications. ACM Transactions on Design Automation of Electronic Systems, 9, 133–158 (April).

    Article  Google Scholar 

  39. van Swaaij, M. Franssen, F., Catthoor, F., & De Man, H. (1992). Modeling data flow and control flow for high level memory management. In Proc. of the European conference on design automation. Brussels, Belgium (pp. 8–13) (March).

  40. De Greef, E., Catthoor, F., & De Man, H. (1997). Memory size reduction through storage order optimization for embedded parallel multimedia applications. Elsevier Parallel Computing Journal, 23, 1811–1837 (December).

    Article  MATH  Google Scholar 

  41. Shang, W., Hodzic, E., & Chen, Z. (1996). On uniformization of affine dependence algorithms. IEEE Transactions on Computers, 45, 827–840 (July).

    Article  MATH  MathSciNet  Google Scholar 

  42. Knuth, D. (1997). The art of computer programming, volume 3: Sorting and searching, third edition. Addison-Wesley.

  43. IMEC (2007). Atomium web site. http://www.imec.be/design/atomium/.

  44. Kjeldsberg, P. G., Catthoor, F., & Aas, E. J. (2001). Detection of partially simultaneously alive signals in storage requirement estimation for data-intensive applications. In 38th ACM/IEEE design automation conf. Las Vegas N, USA (pp. 365–370) (June).

  45. Kulkarni, D., & Stumm, M. (1993). Loop and data transformations: A tutorial. Tech. Rep. CSRI-337, Computer Systems Research Inst., Univ. of Toronto (June).

  46. Moonen, M., Dooren, P. V., & Vandewalle, J. (1992). An svd updating algorithm for subspace tracking. SIAM Journal on Matrix Analysis and Applications, 13(4), 1015–1038.

    Article  MATH  MathSciNet  Google Scholar 

  47. Danckaert, K., Catthoor, F., & De Man, H. (2000). A preprocessing step for global loop transformations for data transfer and storage optimization. In Proc. intnl. conf. on compilers, arch. and synth. for emb. sys. San Jose, CA, USA (pp. 34–40) (November).

  48. Wuytack, S., Diguet, J. P., Catthoor, F., & De Man, H. (1998). Formalized methodology for data reuse exploration for low-power hierarchical memory mappings. IEEE Transactions on VLSI Systems, 6, 529–537 (December).

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Per Gunnar Kjeldsberg.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kjeldsberg, P.G., Catthoor, F., Verdoolaege, S. et al. Guidance of Loop Ordering for Reduced Memory Usage in Signal Processing Applications. J Sign Process Syst Sign Image Video Technol 53, 301–321 (2008). https://doi.org/10.1007/s11265-008-0178-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-008-0178-6

Keywords

Navigation