Abstract
In this paper, we study the impact of synchronization and granularity on the performance of parallel systems using an execution-driven simulation technique. We find that even though there can be a lot of parallelism at the fine grain level, synchronization and scheduling strategies determine the ultimate performance of the system. Loop-iteration level parallelism seems to be a more appropriate level when those factors are considered. We also study barrier synchronization and data synchronization at the loop iteration level and found both schemes are needed for a better performance.
- 1 R. Allen, D. Gallahan, and K. Kennedy. Automatic decomposition of scientific programs for parallel execution. ACM Symp. on Principles of Programming Languages, 63-76, Jan. 1987. Google ScholarDigital Library
- 2 Alliant. FX/Series Architecture Manual. Alliant Computer Systems Corp., Jan. 1986.Google Scholar
- 3 G. Amdahl. Validity of the single-processor appIO8Cb t0 8ChkViIIg kUgtSCdC COmpUter capabilities. AFIPS Conf., 483485, 1967.Google Scholar
- 4 Arvind, D. Culler, and G. Maa. Assessing the benefits of fine-grain parallelism in dataflow programs. Supercomputing, 60-69, Nov. 1988. Google ScholarDigital Library
- 5 D.-K. Chen. MaxPar: An Execution Driven Simulator for Studying Parallel Systems. CSRD T&917, Center for Supercomputing Research and Development, Univ. of lllinois at Urbana-Champaign, Sep. 1989.Google Scholar
- 6 R. Cytron. Doacross: beyond vtctorization for muI- tiprocessors. 1986 Int. Conf. on Parallel Processing, 836-845, Aug. 1986.Google Scholar
- 7 H. Diets, T. Schwederski, M. O'keefe, and A. Zaa Static synchronization beyond VLIW. Supercomputing, 416425, Nov. 1989. Google ScholarDigital Library
- 8 Z. Fang, P. Yew, P. Tang, and C. Zhu. Dynamic processor self-scheduling for general paraRe nested loops. 1987 Int. Conf. Parallel Processing, l-10, Aug. 1987.Google Scholar
- 9 J. Fisher. Very long word instruction architecture and the ELI-512. hat. Sym. Computer Architecture, 140-150, June 1983. Google ScholarDigital Library
- 10 M, Flynn, C. Mitchell, and J. Mulder. And now a case for more complex instruction sets. IEEE Computer, 71-83, Sep. 1987. Google ScholarDigital Library
- 11 A. Gottheb, R. Grishman, C. Kruskal, K. McAuliffe, L. Rudolph, and M. Snir. The NYU Ultracomputer - designing an MIMD shared memory parallel computer. IEEE T&s. Comput., 175-189, Feb. 1983.Google Scholar
- 12 N. Jouppi and D. WaR. AvaiiabIe instruction-level parallelism for superscalar and superpipehned machines. Int. Conf. Architectural Support for Programming Languages and Operating Systems, 272- 282, Apr. 1989. Google ScholarDigital Library
- 13 D. Kuck, E. Davidson, D. Lawrie, and A. Sameh. ParaRe1 supercomputing today and the Cedar approach. Science, 231:967-974, Feb. 1986.Google Scholar
- 14 D. Kuck, A. Sameh, R. Cytron, A. Veidenbaum, C. Polychronopoulos, G. Lee, T. McDaniel, B. Leasure, C. Beckman, J. Davies, and C. Kruskal. The effects of program restructuring, algorithm change, and architecture choice on program performance. 1984 Int. Conf. on Parallel Processing, Aug. 1984.Google Scholar
- 15 J. Kuebn and B. Smith. The Horizon supercomPuting system: architecture and software. Supercomputing, 28-34, Nov. 1988. Google ScholarDigital Library
- 16 M. Kumar. Effect of storage aRocation/rcclamation methods on parallelism and storage requirements. Int. Symp. on Computer Architecture, 197-205, June 1987. Google ScholarDigital Library
- 17 A. Nicolau and J. Fisher. Using an oracle to measure potential parallelism in single instruction stream programs. Annual Microprogramming Workshop, 171-182, 1981. Google ScholarDigital Library
- 18 The Perfect Club, et al. The Perfect Club benchmarks: effective performance evaluation of supercomputers Int. J. of Supercomputer Applicatiorw, 5-40, Fall 1989.Google Scholar
- 19 G. Pfister, W. Brantley, D. George, S. Harvey, W. Kleinfelder, K. McAuliffe, EN. Melton, V. Norton, and J. Weiss. The IBM research parallel processor prototype (RP3): introduction and architecture. 1985 Int. Conf. on Parallel Processing, 764-771, Aug. 1985.Google Scholar
- 20 C. Polychronopoulos and D. Kuck. Guided selfscheduling: a practical schcduIing scheme for parallel supercomputers. IEEE Tbanr. Computer, 1425- 1439, Dec. 1987. Google ScholarDigital Library
- 21 B. J. Smith. A pipelined, shared resource :mimd computer. 1978 Int. Conf. on Parallel Processing, 16-8, Aug. 1978.Google Scholar
- 22 M. Smith, M. Johnson, and M. Horowitz. Limits on Multiple Instruction Issue. Int. Conf. Architectural Support for Programming Languages and Operating Systems, 290-302, Apr. 1989. Google ScholarDigital Library
- 23 H. Su and P. Yew. On data s:ynchronization for multiprocessors, Int. Sym. Computer Architecture, 416- 423, May 1989. Google ScholarDigital Library
- 24 A. Veidenbaum. Compiler Optimizations and Architecture Design Issues for Multiprocessors. Ph.D. Thesis, Dept. of Computer Science, Univ. of IRinoiz at Urbana-Champaign, Champaign, May 1985. Google ScholarDigital Library
- 25 C. Zhu and P. Yew. A scheme to enforce data dependence on large multiprocessor systems. IEEE Trans. Software Eng., 726-739, June 1987. Google ScholarDigital Library
Index Terms
- The impact of synchronization and granularity on parallel systems
Recommendations
The impact of synchronization and granularity on parallel systems
ISCA '90: Proceedings of the 17th annual international symposium on Computer ArchitectureIn this paper, we study the impact of synchronization and granularity on the performance of parallel systems using an execution-driven simulation technique. We find that even though there can be a lot of parallelism at the fine grain level, ...
Time synchronization on SP1 and SP2 parallel systems
IPPS '95: Proceedings of the 9th International Symposium on Parallel ProcessingWe describe an experimental time utility for synchronizing the operating system clocks on the SP1 and SP2 parallel system nodes. It synchronizes the node clocks typically, within 5 microseconds of each other utilizing the synchronous feature of the SP1 ...
Towards the optimal synchronization granularity for dynamic scheduling of pipelined computations on heterogeneous computing systems
Loops are the richest source of parallelism in scientific applications. A large number of loop scheduling schemes have therefore been devised for loops with and without data dependencies (modeled as dependence distance vectors) on heterogeneous ...
Comments