Abstract
This paper gives an overview of locality enhancement techniques used by the Jasmine compiler, currently under development at the University of Toronto. These techniques enhance memory locality, cache locality across loop nests (inter-loop-nest cache locality) and cache locality within a loop nest (intra-loop-nest cache locality) in dense-matrix scientific applications. The compiler also exploits machine-specific features to further enhance locality. Experimental evaluation of these techniques on different multiprocessor platforms indicates that they are effective in improving overall performance of benchmarks; some of the techniques improve parallel execution time by up to 6 times.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
T. Abdelrahman and T. Wong. Compiler support for array distribution on NUMA shared memory multiprocessors. The Journal of Supercomputing, to appear, 1998.
C. Amza, A. Cox, et al. Treadmarks: shared memory computing on networks of workstations. IEEE Computer, 29(12):78–82, 1996.
W. Blume, R. Doallo, R. Eigenmann, J. Grout, J. Hoeflinger, T. Lawrence, J. Lee, D. Padua, Y. Paek, B. Pottenger, L. Rauchwerger, and P. Tu. Parallel programming with Polaris. IEEE Computer, 29(12):78–82, 1996.
Convex Computer Corporation. Convex Exemplar System Overview. Richardson, TX, USA, 1994.
M. Hall, J. Anderson, S. Amarasinghe, B. Murphy, S. Liao, E. Bugnion, and M. Lam. Maximizing multiprocessor performance with the SUIF compiler. IEEE Computer, 29(12):84–89, 1996.
Silicon Graphics Inc. The SGI Origin 20000. Mountain View, CA, 1996.
K. Kennedy and U. Kremer. Automatic data layout for High Performance Fortran. In Proc. of Supercomputing, pages 2090–2114, 1995.
J. Kuskin, D. Ofelt, M. Heinrich, et al. The Stanford FLASH multiprocessor. In Proc. of the 21st Annual Int’l Symposium on Computer Architecture, pages 302–313, 1994.
R. LaRowe Jr., J. Wilkes, and C. Ellis. Exploiting operating system support for dynamic page placement on a NUMA shared memory multiprocessor. In Proc. of the 3rd ACM Symposium on Principles and Practice of Parallel Programming, pages 122–132, 1991.
G. Liu and T. Abdelrahman. Computation-communication overlap on network-of-workstation multiprocessors. In Proc. of the Int’l Conference on Parallel and Distributed Processing Techniques and Applications, to appear, 1998.
N. Manjikian and T. Abdelrahman. Scheduling of wavefront parallelism on scalable shared memory multiprocessors. In Proc. of the Int’l Conference on Parallel Processing, pages III-122–III-131, 1996.
N. Manjikian and T. Abdelrahman. Fusion of loops for parallelism and locality. IEEE Trans. on Parallel and Distributed Systems, 8(2):193–209, 1997.
K. McKinley and O. Temam. A quantitative analysis of loop nest locality. In Proc. of the 7th Int’l Conference on Architectural Support for Programming Languages and Operating Systems, pages 94–104, 1996.
The POW multiprocessor project. University of Toronto. http://www.eecg.toronto.edu/parallel/sigpow, 1995.
S. Tandri and T. Abdelrahman. Computation and data partitioning on scalable shared memory multiprocessors. In Proc. of the Int’l Conference on Parallel and Distributed Processing Techniques and Applications, pages 41–50, 1995.
S. Tandri and T. Abdelrahman. Automatic partitioning of data and computation on scalable shared memory multiprocessors. In Proc. of the Int’l Conference on Parallel Processing, pages 64–73, 1997.
Z. Vranesic, S. Brown, et al. The NUMAchine multiprocessor. Technical Report CSRI-324, Computer Systems Research Institute, University of Toronto, 1995.
M. Wolf. Improving locality and parallelism in nested loops. PhD thesis, Department of Computer Science, Stanford University, 1992.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Abdelrahman, T., Manjikian, N., Liu, G., Tandri, S. (1998). Locality Enhancement for Large-Scale Shared-Memory Multiprocessors. In: O’Hallaron, D.R. (eds) Languages, Compilers, and Run-Time Systems for Scalable Computers. LCR 1998. Lecture Notes in Computer Science, vol 1511. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49530-4_24
Download citation
DOI: https://doi.org/10.1007/3-540-49530-4_24
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65172-7
Online ISBN: 978-3-540-49530-7
eBook Packages: Springer Book Archive