Locality Enhancement for Large-Scale Shared-Memory Multiprocessors

Abdelrahman, Tarek; Manjikian, Naraig; Liu, Gary; Tandri, S.

doi:10.1007/3-540-49530-4_24

Tarek Abdelrahman⁵,
Naraig Manjikian⁶,
Gary Liu⁷ &
…
S. Tandri⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1511))

Included in the following conference series:

International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers

238 Accesses
1 Citations

Abstract

This paper gives an overview of locality enhancement techniques used by the Jasmine compiler, currently under development at the University of Toronto. These techniques enhance memory locality, cache locality across loop nests (inter-loop-nest cache locality) and cache locality within a loop nest (intra-loop-nest cache locality) in dense-matrix scientific applications. The compiler also exploits machine-specific features to further enhance locality. Experimental evaluation of these techniques on different multiprocessor platforms indicates that they are effective in improving overall performance of benchmarks; some of the techniques improve parallel execution time by up to 6 times.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

T. Abdelrahman and T. Wong. Compiler support for array distribution on NUMA shared memory multiprocessors. The Journal of Supercomputing, to appear, 1998.
Google Scholar
C. Amza, A. Cox, et al. Treadmarks: shared memory computing on networks of workstations. IEEE Computer, 29(12):78–82, 1996.
Google Scholar
W. Blume, R. Doallo, R. Eigenmann, J. Grout, J. Hoeflinger, T. Lawrence, J. Lee, D. Padua, Y. Paek, B. Pottenger, L. Rauchwerger, and P. Tu. Parallel programming with Polaris. IEEE Computer, 29(12):78–82, 1996.
Google Scholar
Convex Computer Corporation. Convex Exemplar System Overview. Richardson, TX, USA, 1994.
Google Scholar
M. Hall, J. Anderson, S. Amarasinghe, B. Murphy, S. Liao, E. Bugnion, and M. Lam. Maximizing multiprocessor performance with the SUIF compiler. IEEE Computer, 29(12):84–89, 1996.
Google Scholar
Silicon Graphics Inc. The SGI Origin 20000. Mountain View, CA, 1996.
Google Scholar
K. Kennedy and U. Kremer. Automatic data layout for High Performance Fortran. In Proc. of Supercomputing, pages 2090–2114, 1995.
Google Scholar
J. Kuskin, D. Ofelt, M. Heinrich, et al. The Stanford FLASH multiprocessor. In Proc. of the 21st Annual Int’l Symposium on Computer Architecture, pages 302–313, 1994.
Google Scholar
R. LaRowe Jr., J. Wilkes, and C. Ellis. Exploiting operating system support for dynamic page placement on a NUMA shared memory multiprocessor. In Proc. of the 3rd ACM Symposium on Principles and Practice of Parallel Programming, pages 122–132, 1991.
Google Scholar
G. Liu and T. Abdelrahman. Computation-communication overlap on network-of-workstation multiprocessors. In Proc. of the Int’l Conference on Parallel and Distributed Processing Techniques and Applications, to appear, 1998.
Google Scholar
N. Manjikian and T. Abdelrahman. Scheduling of wavefront parallelism on scalable shared memory multiprocessors. In Proc. of the Int’l Conference on Parallel Processing, pages III-122–III-131, 1996.
Google Scholar
N. Manjikian and T. Abdelrahman. Fusion of loops for parallelism and locality. IEEE Trans. on Parallel and Distributed Systems, 8(2):193–209, 1997.
Article Google Scholar
K. McKinley and O. Temam. A quantitative analysis of loop nest locality. In Proc. of the 7th Int’l Conference on Architectural Support for Programming Languages and Operating Systems, pages 94–104, 1996.
Google Scholar
The POW multiprocessor project. University of Toronto. http://www.eecg.toronto.edu/parallel/sigpow, 1995.
S. Tandri and T. Abdelrahman. Computation and data partitioning on scalable shared memory multiprocessors. In Proc. of the Int’l Conference on Parallel and Distributed Processing Techniques and Applications, pages 41–50, 1995.
Google Scholar
S. Tandri and T. Abdelrahman. Automatic partitioning of data and computation on scalable shared memory multiprocessors. In Proc. of the Int’l Conference on Parallel Processing, pages 64–73, 1997.
Google Scholar
Z. Vranesic, S. Brown, et al. The NUMAchine multiprocessor. Technical Report CSRI-324, Computer Systems Research Institute, University of Toronto, 1995.
Google Scholar
M. Wolf. Improving locality and parallelism in nested loops. PhD thesis, Department of Computer Science, Stanford University, 1992.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario, Canada, M5S 3G4
Tarek Abdelrahman
Department of Electrical and Computer Engineering, Queens University, Kingston, Ontario, Canada, K7L 3N6
Naraig Manjikian
IBM Canada, Ltd, Toronto, Ontario, Canada, M3C 1V7
Gary Liu & S. Tandri

Authors

Tarek Abdelrahman
View author publications
You can also search for this author in PubMed Google Scholar
Naraig Manjikian
View author publications
You can also search for this author in PubMed Google Scholar
Gary Liu
View author publications
You can also search for this author in PubMed Google Scholar
S. Tandri
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science and Electrical and Computer Engineering School of Computer Science, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA, 15213-3891, USA
David R. O’Hallaron

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Abdelrahman, T., Manjikian, N., Liu, G., Tandri, S. (1998). Locality Enhancement for Large-Scale Shared-Memory Multiprocessors. In: O’Hallaron, D.R. (eds) Languages, Compilers, and Run-Time Systems for Scalable Computers. LCR 1998. Lecture Notes in Computer Science, vol 1511. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49530-4_24

Download citation

DOI: https://doi.org/10.1007/3-540-49530-4_24
Published: 24 September 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65172-7
Online ISBN: 978-3-540-49530-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics