Skip to main content

Locality Enhancement for Large-Scale Shared-Memory Multiprocessors

  • Conference paper
  • First Online:
Languages, Compilers, and Run-Time Systems for Scalable Computers (LCR 1998)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1511))

Abstract

This paper gives an overview of locality enhancement techniques used by the Jasmine compiler, currently under development at the University of Toronto. These techniques enhance memory locality, cache locality across loop nests (inter-loop-nest cache locality) and cache locality within a loop nest (intra-loop-nest cache locality) in dense-matrix scientific applications. The compiler also exploits machine-specific features to further enhance locality. Experimental evaluation of these techniques on different multiprocessor platforms indicates that they are effective in improving overall performance of benchmarks; some of the techniques improve parallel execution time by up to 6 times.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. T. Abdelrahman and T. Wong. Compiler support for array distribution on NUMA shared memory multiprocessors. The Journal of Supercomputing, to appear, 1998.

    Google Scholar 

  2. C. Amza, A. Cox, et al. Treadmarks: shared memory computing on networks of workstations. IEEE Computer, 29(12):78–82, 1996.

    Google Scholar 

  3. W. Blume, R. Doallo, R. Eigenmann, J. Grout, J. Hoeflinger, T. Lawrence, J. Lee, D. Padua, Y. Paek, B. Pottenger, L. Rauchwerger, and P. Tu. Parallel programming with Polaris. IEEE Computer, 29(12):78–82, 1996.

    Google Scholar 

  4. Convex Computer Corporation. Convex Exemplar System Overview. Richardson, TX, USA, 1994.

    Google Scholar 

  5. M. Hall, J. Anderson, S. Amarasinghe, B. Murphy, S. Liao, E. Bugnion, and M. Lam. Maximizing multiprocessor performance with the SUIF compiler. IEEE Computer, 29(12):84–89, 1996.

    Google Scholar 

  6. Silicon Graphics Inc. The SGI Origin 20000. Mountain View, CA, 1996.

    Google Scholar 

  7. K. Kennedy and U. Kremer. Automatic data layout for High Performance Fortran. In Proc. of Supercomputing, pages 2090–2114, 1995.

    Google Scholar 

  8. J. Kuskin, D. Ofelt, M. Heinrich, et al. The Stanford FLASH multiprocessor. In Proc. of the 21st Annual Int’l Symposium on Computer Architecture, pages 302–313, 1994.

    Google Scholar 

  9. R. LaRowe Jr., J. Wilkes, and C. Ellis. Exploiting operating system support for dynamic page placement on a NUMA shared memory multiprocessor. In Proc. of the 3rd ACM Symposium on Principles and Practice of Parallel Programming, pages 122–132, 1991.

    Google Scholar 

  10. G. Liu and T. Abdelrahman. Computation-communication overlap on network-of-workstation multiprocessors. In Proc. of the Int’l Conference on Parallel and Distributed Processing Techniques and Applications, to appear, 1998.

    Google Scholar 

  11. N. Manjikian and T. Abdelrahman. Scheduling of wavefront parallelism on scalable shared memory multiprocessors. In Proc. of the Int’l Conference on Parallel Processing, pages III-122–III-131, 1996.

    Google Scholar 

  12. N. Manjikian and T. Abdelrahman. Fusion of loops for parallelism and locality. IEEE Trans. on Parallel and Distributed Systems, 8(2):193–209, 1997.

    Article  Google Scholar 

  13. K. McKinley and O. Temam. A quantitative analysis of loop nest locality. In Proc. of the 7th Int’l Conference on Architectural Support for Programming Languages and Operating Systems, pages 94–104, 1996.

    Google Scholar 

  14. The POW multiprocessor project. University of Toronto. http://www.eecg.toronto.edu/parallel/sigpow, 1995.

  15. S. Tandri and T. Abdelrahman. Computation and data partitioning on scalable shared memory multiprocessors. In Proc. of the Int’l Conference on Parallel and Distributed Processing Techniques and Applications, pages 41–50, 1995.

    Google Scholar 

  16. S. Tandri and T. Abdelrahman. Automatic partitioning of data and computation on scalable shared memory multiprocessors. In Proc. of the Int’l Conference on Parallel Processing, pages 64–73, 1997.

    Google Scholar 

  17. Z. Vranesic, S. Brown, et al. The NUMAchine multiprocessor. Technical Report CSRI-324, Computer Systems Research Institute, University of Toronto, 1995.

    Google Scholar 

  18. M. Wolf. Improving locality and parallelism in nested loops. PhD thesis, Department of Computer Science, Stanford University, 1992.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Abdelrahman, T., Manjikian, N., Liu, G., Tandri, S. (1998). Locality Enhancement for Large-Scale Shared-Memory Multiprocessors. In: O’Hallaron, D.R. (eds) Languages, Compilers, and Run-Time Systems for Scalable Computers. LCR 1998. Lecture Notes in Computer Science, vol 1511. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49530-4_24

Download citation

  • DOI: https://doi.org/10.1007/3-540-49530-4_24

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-65172-7

  • Online ISBN: 978-3-540-49530-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics