ABSTRACT
We want to perform compile-time analysis of an SPMD program and place barriers in it to synchronize it correctly, minimizing the runtime cost of the synchronization. This is the barrier minimization problem. No full solution to the problem has been given previously.Here we model the problem with a new combinatorial structure, a nested family of sets of circular intervals. We show that barrier minimization is equivalent to finding a hierarchy of minimum cardinality point sets that cut all intervals. For a single loop, modeled as a simple family of circular intervals, a linear-time algorithm is known. We extend this result, finding a linear-time solution for nested circular interval families. This result solves the barrier minimization problem for general nested loops.
- A. Agarwal and M. Cherian. Adaptive backoff synchronization techniques. In Proceedings of the 16th Annual International Symposium on Computer Architecture (ISCA'89), pages 396--406. ACM Press, 1989. Google ScholarDigital Library
- A. Aiken and D. Gay. Barrier inference. In Proceedings of the 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (PoPL'98), pages 342--354. ACM Press, 1998. Google ScholarDigital Library
- Co-Array Fortran. http://www.co-array.org/.Google Scholar
- A. Darte and R. Schreiber. Nested circular intervals: A model for barrier placement in SPMD codes with nested loops. Technical Report RR2004-57, LIP, ENS-Lyon, Dec. 2004. http://www.ens-lyon.fr/LIP/Pub/Rapports/RR/RR2004/RR2004-57.pdf.Google Scholar
- P. J. Hatcher and M. J. Quinn. Data-Parallel Programming on MIMD Computers. The MIT Press, 1991. Google ScholarDigital Library
- W.-L. Hsu and K.-H. Tsai. Linear time algorithms on circular-arc graphs. Information Processing Letters, 40(3):123--129, 1991. Google ScholarDigital Library
- J. M. Mellor-Crummey and M. L. Scott. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Transactions on Computer Systems, 9(1):21--65, 1991. Google ScholarDigital Library
- M. O'Boyle and E. Stöhr. Compile time barrier synchronization minimization. IEEE Transactions on Parallel and Distributed Systems, 13(6):529--543, 2002. Google ScholarDigital Library
- C.-W. Tseng. Compiler optimizations for eliminating barrier synchronization. In PPoPP'95: Proceedings of the fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 144--155. ACM Press, 1995. Google ScholarDigital Library
- Unified Parallel C. http://upc.gwu.edu/.Google Scholar
- M. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley, 1996. Google ScholarDigital Library
- K. Yelick, L. Semenzato, G. Pike, C. Miyamoto, B. Liblit, A. Krishnamurthy, P. Hilfinger, S. Graham, D. Gay, P. Colella, and A. Aiken. Titanium: A high-performance Java dialect. Concurrency: Practice and Experience, 10(11-13):825--836, Sept-Nov 1998.Google ScholarCross Ref
Index Terms
- A linear-time algorithm for optimal barrier placement
Recommendations
Representation characterizations of chordal bipartite graphs
A bipartite graph is chordal bipartite if it does not contain an induced cycle of length at least six. We give three representation characterizations of chordal bipartite graphs. More precisely, we show that a bipartite graph is chordal bipartite if and ...
Iterational retiming: maximize iteration-level parallelism for nested loops
CODES+ISSS '05: Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesisNested loops are the most critical sections in many scientific and Digital Signal Processing (DSP)applications.It is important to study effective and efficient transformation techniques to increase parallelism for nested loops.In this paper, we propose ...
A method for estimating optimal unrolling times for nested loops
ISPAN '97: Proceedings of the 1997 International Symposium on Parallel Architectures, Algorithms and NetworksLoop unrolling is one of the most promising parallelization techniques, because the nature of programs causes most of the processing time to be spent in their loops. Unrolling not only the innermost loop but also outer loops greatly expands the scope ...
Comments