Abstract
High-level loop optimizations are necessary to achieve good performance over a wide variety of processors. Their performance impact can be significant because they involve in-depth program transformations that aim to sustain a balanced workload over the computational, storage, and communication resources of the target architecture. Therefore, it is mandatory that the compiler accurately models the target architecture as well as the effects of complex code restructuring.
However, because optimizing compilers (1) use simplistic performance models that abstract away many of the complexities of modern architectures, (2) rely on inaccurate dependence analysis, and (3) lack frameworks to express complex interactions of transformation sequences, they typically uncover only a fraction of the peak performance available on many applications. We propose a complete iterative framework to address these issues. We rely on the polyhedral model to construct and traverse a large and expressive search space. This space encompasses only legal, distinct versions resulting from the restructuring of any static control loop nest. We first propose a feedback-driven iterative heuristic tailored to the search space properties of the polyhedral model. Though, it quickly converges to good solutions for small kernels, larger benchmarks containing higher dimensional spaces are more challenging and our heuristic misses opportunities for significant performance improvement. Thus, we introduce the use of a genetic algorithm with specialized operators that leverage the polyhedral representation of program dependences. We provide experimental evidence that the genetic algorithm effectively traverses huge optimization spaces, achieving good performance improvements on large loop nests.
- F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin, M. F. P. O'Boyle, J. Thomson, M. Toussaint, and C. K. I. Williams. Using machine learning to focus iterative optimization. In IEEE/ACM Intl. Symp. on Code Generation and Optimization (CGO'06), pages 295--305, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarDigital Library
- N. Ahmed, N. Mateev, and K. Pingali. Tiling imperfectly-nested loop nests. In ACM/IEEE Conf. on Supercomputing (SC'00), Dallas, TX, USA, Nov. 2000. Google ScholarDigital Library
- J. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures. Morgan Kaufmann Publishers, 2002. Google ScholarDigital Library
- D. Barthou, J.-F. Collard, and P. Feautrier. Fuzzy array dataflow analysis. J. of Parallel and Distributed Computing, 40:210--226, 1997. Google ScholarDigital Library
- C. Bastoul. Code generation in the polyhedral model is easier than you think. In IEEE Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT'04), pages 7--16, Juan--les--Pins, France, Sept. 2004. Google ScholarDigital Library
- C. Bastoul and P. Feautrier. Improving data locality by chunking. In Intl. Conf. on Compiler Construction (ETAPS CC 12), volume 2622, pages 320--335, Warsaw, Poland, Apr. 2003. Google ScholarDigital Library
- A. Bernstein. Analysis of programs for parallel processing. IEEE Trans. on Electronic Computers, 15(5):757--763, Oct. 1966.Google ScholarCross Ref
- F. Bodin, T. Kisuki, P. M. W. Knijnenburg, M. F. P. O'Boyle, and E. Rohou. Iterative compilation in a non-linear optimisation space. In W. on Profile and Feedback Directed Compilation, Paris, Oct. 1998.Google Scholar
- U. Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In Intl. Conf. on Compiler Construction (ETAPS CC 17), Budapest, Hungary, Apr. 2008. Google ScholarDigital Library
- U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. A practical automatic polyhedral parallelization and locality optimization system. In ACM SIGPLAN Conf. on Programming Languages Design and Implementation (PLDI'08), Tucson, AZ, USA, June 2008. Google ScholarDigital Library
- K. D. Cooper, A. Grosul, T. J. Harvey, S. Reeves, D. Subramanian, L. Torczon, and T. Waterman. ACME: adaptive compilation made efficient. In ACM SIGLPAN/SIGBED Conf. on Languages, Compilers, and Tools for Embedded Systems (LCTES'05), pages 69--77, Chicago, IL, USA, 2005. ACM Press. Google ScholarDigital Library
- K. D. Cooper, P. J. Schielke, and D. Subramanian. Optimizing for reduced code space using genetic algorithms. In Workshop on Languages, Compilers, and Tools for Embedded Systems, pages 1--9, Atlanta, GA, USA, July 1999. ACM Press. Google ScholarDigital Library
- K. D. Cooper, D. Subramanian, and L. Torczon. Adaptive optimizing compilers for the 21st century. J. Supercomputing, 23(1):7--22, 2002. Google ScholarDigital Library
- A. Darte, Y. Robert, and F. Vivien. Scheduling and Automatic Parallelization. Birkhauser, 2000. Google ScholarDigital Library
- P. Feautrier. Parametric integer programming. RAIRO Recherche Opérationnelle, 22(3):243--268, 1988.Google ScholarCross Ref
- P. Feautrier. Some efficient solutions to the affine scheduling problem, part I: one dimensional time. Intl. J. of Parallel Programming, 21(5):313--348, Oct. 1992. Google ScholarDigital Library
- P. Feautrier. Some efficient solutions to the affine scheduling problem, part II: multidimensional time. Intl. J. of Parallel Programming, 21(6):389--420, Dec. 1992. Google ScholarDigital Library
- S. Girbal, N. Vasilache, C. Bastoul, A. Cohen, D. Parello, M. Sigler, and O. Temam. Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies. Intl. J. of Parallel Programming, 34(3), 2006. Google ScholarDigital Library
- D. E. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Longman Publishing Co. Inc., Boston, MA, USA, 1989. Google ScholarDigital Library
- M. Haneda, P. M. W. Knijnenburg, and H. A. G. Wijshoff. Automatic selection of compiler options using non-parametric inferential statistics. In IEEE Intl.\ Conf.\ on Parallel Architectures and Compilation Techniques (PACT'05), pages 123--132, Saint Louis, MO, USA, 2005. IEEE Computer Society. Google ScholarDigital Library
- W. Kelly. Optimization within a Unified Transformation Framework. PhD thesis, Univ. of Maryland, 1996. Google ScholarDigital Library
- W. Kelly, W. Pugh, and E. Rosser. Code generation for multiple mappings. In Intl. Symp. on the frontiers of massively parallel computation, pages 332--341, McLean, VA, USA, Feb. 1995. Google ScholarDigital Library
- T. Kisuki, P. M. W. Knijnenburg, and M. F. P. O'Boyle. Combined selection of tile sizes and unroll factors using iterative compilation. In IEEE Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT'00), pages 237--246, Philadelphia, PA, USA, 2000. IEEE Computer Society. Google ScholarDigital Library
- P. A. Kulkarni, S. R. Hines, D. B. Whalley, J. D. Hiser, J. W. Davidson, and D. L. Jones. Fast and efficient searches for effective optimization-phase sequences. ACM Trans. on Architecture and Code Optimization, 2(2):165--198, 2005. Google ScholarDigital Library
- M. Le Fur. Scanning parameterized polyhedron using Fourier-Motzkin elimination. Concurrency -- Practice and Experience, 8(6):445--460, 1996.Google Scholar
- C. Lee. UTDSP benchmark suite, 1998. http://www.eecg.toronto.edu/char‘ corinna/DSP.Google Scholar
- A. W. Lim and M. S. Lam. Maximizing parallelism and minimizing synchronization with affine transforms. In ACM Symp. on Principles of Programming Languages (PoPL'97), pages 201--214, Paris, France, 1997. ACM Press. Google ScholarDigital Library
- S. Long and G. Fursin. Systematic search within an optimisation space based on unified transformation framework. IJCSE Intl. J. of Computational Science and Engineering, 2006. Google ScholarDigital Library
- S. Long and M. O'Boyle. Adaptive Java optimisation using instance-based learning. In ACM Intl. Conf. on Supercomputing (ICS'04), pages 237--246, Saint-Malo, France, June 2004. Google ScholarDigital Library
- A. Nisbet. GAPS: A compiler framework for genetic algorithm (GA) optimised parallelisation. In HPCN Europe 1998: Proc. of the Intl. Conf. and Exhibition on High-Performance Computing and Networking, pages 987--989, London, UK, 1998. Springer-Verlag. Google ScholarDigital Library
- M. Palkovič. Enhanced Applicability of Loop Transformations. PhD thesis, T.U. Eindhoven, The Netherlands, Sept. 2007.Google Scholar
- S. Pop, A. Cohen, C. Bastoul, S. Girbal, P. Jouvelot, G.-A. Silber, and N. Vasilache. GRAPHITE: Loop optimizations based on the polyhedral model for GCC. In Proc. of the 4th GCC Developper's Summit, Ottawa, Canada, June 2006.Google Scholar
- L.-N. Pouchet, C. Bastoul, J. Cavazos, and A. Cohen. A note on the performance distribution of affine schedules. 2nd Workshop on Statistical and Machine learning approaches to ARchitectures and compilaTion (SMART'08), Göteborg, Sweden, Jan. 2008.Google Scholar
- L.-N. Pouchet, C. Bastoul, A. Cohen, and N. Vasilache. Iterative optimization in the polyhedral model: Part I, one-dimensional time. In IEEE/ACM Intl. Symp. on Code Generation and Optimization (CGO'07), pages 144--156, San Jose, CA, USA, Mar. 2007. Google ScholarDigital Library
- W. Pugh. The Omega test: a fast and practical integer programming algorithm for dependence analysis. In ACM Intl. Conf. on Supercomputing (ICS'91), pages 4--13, Albuquerque, NM, USA, Aug. 1991. Google ScholarDigital Library
- M. Püschel, B. Singer, J. Xiong, J. Moura, J. Johnson, D. Padua, M. Veloso, and R. W. Johnson. SPIRAL: A generator for platform-adapted libraries of signal processing algorithms. J. of High Performance Computing and Applications, special issue on Automatic Performance Tuning, 18(1):21--45, 2004. Google ScholarDigital Library
- F. Quilleré, S. Rajopadhye, and D. Wilde. Generation of efficient nested loops from polyhedra. Intl. J. of Parallel Programming, 28(5):469--498, Oct. 2000. Google ScholarDigital Library
- L. Renganarayanan, D. Kim, S. Rajopadhye, and M. M. Strout. Parameterized tiled loops for free. SIGPLAN Notices, Proc. of the 2007 PLDI Conf., 42(6):405--414, 2007. Google ScholarDigital Library
- A. Schrijver. Theory of Linear and Integer Programming. John Wiley & Sons, 1986. Google ScholarDigital Library
- M. Stephenson, S. Amarasinghe, M. Martin, and U.-M. O'Reilly. Meta optimization: improving compiler heuristics with machine learning. SIGPLAN Notices, 38(5):77--90, 2003. Google ScholarDigital Library
- S. Triantafyllis, M. Vachharajani, and D. I. August. Compiler optimization-space exploration. In J. of Instruction-level Parallelism, volume 7, Jan. 2005.Google Scholar
- N. Vasilache, C. Bastoul, and A. Cohen. Polyhedral code generation in the real world. In Proc. of the Intl. Conf. on Compiler Construction (ETAPS CC 16), volume 3923, pages 185--201, Vienna, Austria, Mar. 2006. Springer-Verlag. Google ScholarDigital Library
- N. Vasilache, A. Cohen, and L.-N. Pouchet. Automatic correction of loop transformations. In IEEE Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT'07), pages 292--302, Brasov, Romania, Sept. 2007. Google ScholarDigital Library
- F. Vivien. On the optimality of Feautrier's scheduling algorithm. In Intl. Euro-Par Conf. on Parallel Processing (EURO--PAR'02), pages 299--308, London, UK, 2002. Springer-Verlag. Google ScholarDigital Library
- D. K. Wilde. A library for doing polyhedral operations. Technical Report 785, IRISA, Rennes, France, 1993.Google Scholar
- M. Wolfe. High performance compilers for parallel computing. Addison-Wesley Publishing Company, 1995. Google ScholarDigital Library
- J. Xue. Transformations of nested loops with non-convex iteration spaces. Parallel Computing, 22(3):339--368, 1996. Google ScholarDigital Library
Index Terms
- Iterative optimization in the polyhedral model: part ii, multidimensional time
Recommendations
Iterative optimization in the polyhedral model: part ii, multidimensional time
PLDI '08: Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and ImplementationHigh-level loop optimizations are necessary to achieve good performance over a wide variety of processors. Their performance impact can be significant because they involve in-depth program transformations that aim to sustain a balanced workload over the ...
Single-dimension software pipelining for multidimensional loops
Traditionally, software pipelining is applied either to the innermost loop of a given loop nest or from the innermost loop to outer loops. This paper proposes a three-step approach, called single-dimension software pipelining (SSP), to software pipeline ...
Affine-by-Statement Transformations of Imperfectly Nested Loops
IPPS '96: Proceedings of the 10th International Parallel Processing SymposiumA majority of loop restructuring techniques developed so far assume that loops are perfectly nested. The unimodular approach unifies three individual transformations -- loop interchange, skewing and reversal -- but is still limited to perfect loop ...
Comments