The impact of synchronization and granularity on parallel systems

Authors:
Ding-Kai Chen

Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, Urbana, Illinois

Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, Urbana, Illinois
View Profile

,
Hong-Men Su

Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, Urbana, Illinois

Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, Urbana, Illinois
View Profile

,
Pen-Chung Yew

Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, Urbana, Illinois

Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, Urbana, Illinois
View Profile

Authors Info & Claims

ACM SIGARCH Computer Architecture News Volume 18 Issue 2SIJune 1990pp 239–248https://doi.org/10.1145/325096.325150

Published:01 May 1990Publication History

ACM SIGARCH Computer Architecture News

Abstract

In this paper, we study the impact of synchronization and granularity on the performance of parallel systems using an execution-driven simulation technique. We find that even though there can be a lot of parallelism at the fine grain level, synchronization and scheduling strategies determine the ultimate performance of the system. Loop-iteration level parallelism seems to be a more appropriate level when those factors are considered. We also study barrier synchronization and data synchronization at the loop iteration level and found both schemes are needed for a better performance.

References

1 R. Allen, D. Gallahan, and K. Kennedy. Automatic decomposition of scientific programs for parallel execution. ACM Symp. on Principles of Programming Languages, 63-76, Jan. 1987. Google ScholarDigital Library
2 Alliant. FX/Series Architecture Manual. Alliant Computer Systems Corp., Jan. 1986.Google Scholar
3 G. Amdahl. Validity of the single-processor appIO8Cb t0 8ChkViIIg kUgtSCdC COmpUter capabilities. AFIPS Conf., 483485, 1967.Google Scholar
4 Arvind, D. Culler, and G. Maa. Assessing the benefits of fine-grain parallelism in dataflow programs. Supercomputing, 60-69, Nov. 1988. Google ScholarDigital Library
5 D.-K. Chen. MaxPar: An Execution Driven Simulator for Studying Parallel Systems. CSRD T&917, Center for Supercomputing Research and Development, Univ. of lllinois at Urbana-Champaign, Sep. 1989.Google Scholar
6 R. Cytron. Doacross: beyond vtctorization for muI- tiprocessors. 1986 Int. Conf. on Parallel Processing, 836-845, Aug. 1986.Google Scholar
7 H. Diets, T. Schwederski, M. O'keefe, and A. Zaa Static synchronization beyond VLIW. Supercomputing, 416425, Nov. 1989. Google ScholarDigital Library
8 Z. Fang, P. Yew, P. Tang, and C. Zhu. Dynamic processor self-scheduling for general paraRe nested loops. 1987 Int. Conf. Parallel Processing, l-10, Aug. 1987.Google Scholar
9 J. Fisher. Very long word instruction architecture and the ELI-512. hat. Sym. Computer Architecture, 140-150, June 1983. Google ScholarDigital Library
10 M, Flynn, C. Mitchell, and J. Mulder. And now a case for more complex instruction sets. IEEE Computer, 71-83, Sep. 1987. Google ScholarDigital Library
11 A. Gottheb, R. Grishman, C. Kruskal, K. McAuliffe, L. Rudolph, and M. Snir. The NYU Ultracomputer - designing an MIMD shared memory parallel computer. IEEE T&s. Comput., 175-189, Feb. 1983.Google Scholar
12 N. Jouppi and D. WaR. AvaiiabIe instruction-level parallelism for superscalar and superpipehned machines. Int. Conf. Architectural Support for Programming Languages and Operating Systems, 272- 282, Apr. 1989. Google ScholarDigital Library
13 D. Kuck, E. Davidson, D. Lawrie, and A. Sameh. ParaRe1 supercomputing today and the Cedar approach. Science, 231:967-974, Feb. 1986.Google Scholar
14 D. Kuck, A. Sameh, R. Cytron, A. Veidenbaum, C. Polychronopoulos, G. Lee, T. McDaniel, B. Leasure, C. Beckman, J. Davies, and C. Kruskal. The effects of program restructuring, algorithm change, and architecture choice on program performance. 1984 Int. Conf. on Parallel Processing, Aug. 1984.Google Scholar
15 J. Kuebn and B. Smith. The Horizon supercomPuting system: architecture and software. Supercomputing, 28-34, Nov. 1988. Google ScholarDigital Library
16 M. Kumar. Effect of storage aRocation/rcclamation methods on parallelism and storage requirements. Int. Symp. on Computer Architecture, 197-205, June 1987. Google ScholarDigital Library
17 A. Nicolau and J. Fisher. Using an oracle to measure potential parallelism in single instruction stream programs. Annual Microprogramming Workshop, 171-182, 1981. Google ScholarDigital Library
18 The Perfect Club, et al. The Perfect Club benchmarks: effective performance evaluation of supercomputers Int. J. of Supercomputer Applicatiorw, 5-40, Fall 1989.Google Scholar
19 G. Pfister, W. Brantley, D. George, S. Harvey, W. Kleinfelder, K. McAuliffe, EN. Melton, V. Norton, and J. Weiss. The IBM research parallel processor prototype (RP3): introduction and architecture. 1985 Int. Conf. on Parallel Processing, 764-771, Aug. 1985.Google Scholar
20 C. Polychronopoulos and D. Kuck. Guided selfscheduling: a practical schcduIing scheme for parallel supercomputers. IEEE Tbanr. Computer, 1425- 1439, Dec. 1987. Google ScholarDigital Library
21 B. J. Smith. A pipelined, shared resource :mimd computer. 1978 Int. Conf. on Parallel Processing, 16-8, Aug. 1978.Google Scholar
22 M. Smith, M. Johnson, and M. Horowitz. Limits on Multiple Instruction Issue. Int. Conf. Architectural Support for Programming Languages and Operating Systems, 290-302, Apr. 1989. Google ScholarDigital Library
23 H. Su and P. Yew. On data s:ynchronization for multiprocessors, Int. Sym. Computer Architecture, 416- 423, May 1989. Google ScholarDigital Library
24 A. Veidenbaum. Compiler Optimizations and Architecture Design Issues for Multiprocessors. Ph.D. Thesis, Dept. of Computer Science, Univ. of IRinoiz at Urbana-Champaign, Champaign, May 1985. Google ScholarDigital Library
25 C. Zhu and P. Yew. A scheme to enforce data dependence on large multiprocessor systems. IEEE Trans. Software Eng., 726-739, June 1987. Google ScholarDigital Library

Index Terms

The impact of synchronization and granularity on parallel systems
1. Hardware
  1. Hardware validation

Recommendations

The impact of synchronization and granularity on parallel systems
ISCA '90: Proceedings of the 17th annual international symposium on Computer Architecture

In this paper, we study the impact of synchronization and granularity on the performance of parallel systems using an execution-driven simulation technique. We find that even though there can be a lot of parallelism at the fine grain level, ...
Read More
Time synchronization on SP1 and SP2 parallel systems
IPPS '95: Proceedings of the 9th International Symposium on Parallel Processing

We describe an experimental time utility for synchronizing the operating system clocks on the SP1 and SP2 parallel system nodes. It synchronizes the node clocks typically, within 5 microseconds of each other utilizing the synchronous feature of the SP1 ...
Read More
Towards the optimal synchronization granularity for dynamic scheduling of pipelined computations on heterogeneous computing systems

Loops are the richest source of parallelism in scientific applications. A large number of loop scheduling schemes have therefore been devised for loops with and without data dependencies (modeled as dependence distance vectors) on heterogeneous ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGARCH Computer Architecture News Volume 18, Issue 2SI
Special Issue: Proceedings of the 17th annual international symposium on Computer Architecture
June 1990
356 pages
ISSN:0163-5964
DOI:10.1145/325096
Chairmen:
Jean-Loup Baer,
Larry Snyder,
James Goodman
Issue’s Table of Contents
ISCA '90: Proceedings of the 17th annual international symposium on Computer Architecture
May 1990
378 pages
ISBN:0897913663
DOI:10.1145/325164
Chairmen:
Jean-Loup Baer,
Larry Snyder,
James Goodman
Copyright © 1990 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 May 1990
Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 65
  Total Citations
  View Citations
- 494
  Total Downloads
- Downloads (Last 12 months)60
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

The impact of synchronization and granularity on parallel systems

ACM SIGARCH Computer Architecture News

Abstract

References

Cited By

Index Terms

Recommendations

The impact of synchronization and granularity on parallel systems

Time synchronization on SP1 and SP2 parallel systems

Towards the optimal synchronization granularity for dynamic scheduling of pipelined computations on heterogeneous computing systems