research-article

Iterative optimization in the polyhedral model: part ii, multidimensional time

Authors:
Louis-Noël Pouchet

ALCHEMY Group, INRIA Saclay -- Ile-de-France and Paris-Sud University, Orsay, France

ALCHEMY Group, INRIA Saclay -- Ile-de-France and Paris-Sud University, Orsay, France
View Profile

,
Cédric Bastoul

ALCHEMY Group, INRIA Saclay -- Ile-de-France and Paris-Sud University, Orsay, France

ALCHEMY Group, INRIA Saclay -- Ile-de-France and Paris-Sud University, Orsay, France
View Profile

,
Albert Cohen

ALCHEMY Group, INRIA Saclay -- Ile-de-France and Paris-Sud University, Orsay, France

ALCHEMY Group, INRIA Saclay -- Ile-de-France and Paris-Sud University, Orsay, France
View Profile

,
John Cavazos

Dept. of Computer & Information Sciences, University of Delaware, Newark, DE, USA

Dept. of Computer & Information Sciences, University of Delaware, Newark, DE, USA
View Profile

Authors Info & Claims

ACM SIGPLAN Notices Volume 43 Issue 6June 2008pp 90–100https://doi.org/10.1145/1379022.1375594

Published:07 June 2008Publication History

ACM SIGPLAN Notices

Abstract

High-level loop optimizations are necessary to achieve good performance over a wide variety of processors. Their performance impact can be significant because they involve in-depth program transformations that aim to sustain a balanced workload over the computational, storage, and communication resources of the target architecture. Therefore, it is mandatory that the compiler accurately models the target architecture as well as the effects of complex code restructuring.

However, because optimizing compilers (1) use simplistic performance models that abstract away many of the complexities of modern architectures, (2) rely on inaccurate dependence analysis, and (3) lack frameworks to express complex interactions of transformation sequences, they typically uncover only a fraction of the peak performance available on many applications. We propose a complete iterative framework to address these issues. We rely on the polyhedral model to construct and traverse a large and expressive search space. This space encompasses only legal, distinct versions resulting from the restructuring of any static control loop nest. We first propose a feedback-driven iterative heuristic tailored to the search space properties of the polyhedral model. Though, it quickly converges to good solutions for small kernels, larger benchmarks containing higher dimensional spaces are more challenging and our heuristic misses opportunities for significant performance improvement. Thus, we introduce the use of a genetic algorithm with specialized operators that leverage the polyhedral representation of program dependences. We provide experimental evidence that the genetic algorithm effectively traverses huge optimization spaces, achieving good performance improvements on large loop nests.

References

F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin, M. F. P. O'Boyle, J. Thomson, M. Toussaint, and C. K. I. Williams. Using machine learning to focus iterative optimization. In IEEE/ACM Intl. Symp. on Code Generation and Optimization (CGO'06), pages 295--305, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarDigital Library
N. Ahmed, N. Mateev, and K. Pingali. Tiling imperfectly-nested loop nests. In ACM/IEEE Conf. on Supercomputing (SC'00), Dallas, TX, USA, Nov. 2000. Google ScholarDigital Library
J. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures. Morgan Kaufmann Publishers, 2002. Google ScholarDigital Library
D. Barthou, J.-F. Collard, and P. Feautrier. Fuzzy array dataflow analysis. J. of Parallel and Distributed Computing, 40:210--226, 1997. Google ScholarDigital Library
C. Bastoul. Code generation in the polyhedral model is easier than you think. In IEEE Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT'04), pages 7--16, Juan--les--Pins, France, Sept. 2004. Google ScholarDigital Library
C. Bastoul and P. Feautrier. Improving data locality by chunking. In Intl. Conf. on Compiler Construction (ETAPS CC 12), volume 2622, pages 320--335, Warsaw, Poland, Apr. 2003. Google ScholarDigital Library
A. Bernstein. Analysis of programs for parallel processing. IEEE Trans. on Electronic Computers, 15(5):757--763, Oct. 1966.Google ScholarCross Ref
F. Bodin, T. Kisuki, P. M. W. Knijnenburg, M. F. P. O'Boyle, and E. Rohou. Iterative compilation in a non-linear optimisation space. In W. on Profile and Feedback Directed Compilation, Paris, Oct. 1998.Google Scholar
U. Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In Intl. Conf. on Compiler Construction (ETAPS CC 17), Budapest, Hungary, Apr. 2008. Google ScholarDigital Library
U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. A practical automatic polyhedral parallelization and locality optimization system. In ACM SIGPLAN Conf. on Programming Languages Design and Implementation (PLDI'08), Tucson, AZ, USA, June 2008. Google ScholarDigital Library
K. D. Cooper, A. Grosul, T. J. Harvey, S. Reeves, D. Subramanian, L. Torczon, and T. Waterman. ACME: adaptive compilation made efficient. In ACM SIGLPAN/SIGBED Conf. on Languages, Compilers, and Tools for Embedded Systems (LCTES'05), pages 69--77, Chicago, IL, USA, 2005. ACM Press. Google ScholarDigital Library
K. D. Cooper, P. J. Schielke, and D. Subramanian. Optimizing for reduced code space using genetic algorithms. In Workshop on Languages, Compilers, and Tools for Embedded Systems, pages 1--9, Atlanta, GA, USA, July 1999. ACM Press. Google ScholarDigital Library
K. D. Cooper, D. Subramanian, and L. Torczon. Adaptive optimizing compilers for the 21st century. J. Supercomputing, 23(1):7--22, 2002. Google ScholarDigital Library
A. Darte, Y. Robert, and F. Vivien. Scheduling and Automatic Parallelization. Birkhauser, 2000. Google ScholarDigital Library
P. Feautrier. Parametric integer programming. RAIRO Recherche Opérationnelle, 22(3):243--268, 1988.Google ScholarCross Ref
P. Feautrier. Some efficient solutions to the affine scheduling problem, part I: one dimensional time. Intl. J. of Parallel Programming, 21(5):313--348, Oct. 1992. Google ScholarDigital Library
P. Feautrier. Some efficient solutions to the affine scheduling problem, part II: multidimensional time. Intl. J. of Parallel Programming, 21(6):389--420, Dec. 1992. Google ScholarDigital Library
S. Girbal, N. Vasilache, C. Bastoul, A. Cohen, D. Parello, M. Sigler, and O. Temam. Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies. Intl. J. of Parallel Programming, 34(3), 2006. Google ScholarDigital Library
D. E. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Longman Publishing Co. Inc., Boston, MA, USA, 1989. Google ScholarDigital Library
M. Haneda, P. M. W. Knijnenburg, and H. A. G. Wijshoff. Automatic selection of compiler options using non-parametric inferential statistics. In IEEE Intl.\ Conf.\ on Parallel Architectures and Compilation Techniques (PACT'05), pages 123--132, Saint Louis, MO, USA, 2005. IEEE Computer Society. Google ScholarDigital Library
W. Kelly. Optimization within a Unified Transformation Framework. PhD thesis, Univ. of Maryland, 1996. Google ScholarDigital Library
W. Kelly, W. Pugh, and E. Rosser. Code generation for multiple mappings. In Intl. Symp. on the frontiers of massively parallel computation, pages 332--341, McLean, VA, USA, Feb. 1995. Google ScholarDigital Library
T. Kisuki, P. M. W. Knijnenburg, and M. F. P. O'Boyle. Combined selection of tile sizes and unroll factors using iterative compilation. In IEEE Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT'00), pages 237--246, Philadelphia, PA, USA, 2000. IEEE Computer Society. Google ScholarDigital Library
P. A. Kulkarni, S. R. Hines, D. B. Whalley, J. D. Hiser, J. W. Davidson, and D. L. Jones. Fast and efficient searches for effective optimization-phase sequences. ACM Trans. on Architecture and Code Optimization, 2(2):165--198, 2005. Google ScholarDigital Library
M. Le Fur. Scanning parameterized polyhedron using Fourier-Motzkin elimination. Concurrency -- Practice and Experience, 8(6):445--460, 1996.Google Scholar
C. Lee. UTDSP benchmark suite, 1998. http://www.eecg.toronto.edu/char‘ corinna/DSP.Google Scholar
A. W. Lim and M. S. Lam. Maximizing parallelism and minimizing synchronization with affine transforms. In ACM Symp. on Principles of Programming Languages (PoPL'97), pages 201--214, Paris, France, 1997. ACM Press. Google ScholarDigital Library
S. Long and G. Fursin. Systematic search within an optimisation space based on unified transformation framework. IJCSE Intl. J. of Computational Science and Engineering, 2006. Google ScholarDigital Library
S. Long and M. O'Boyle. Adaptive Java optimisation using instance-based learning. In ACM Intl. Conf. on Supercomputing (ICS'04), pages 237--246, Saint-Malo, France, June 2004. Google ScholarDigital Library
A. Nisbet. GAPS: A compiler framework for genetic algorithm (GA) optimised parallelisation. In HPCN Europe 1998: Proc. of the Intl. Conf. and Exhibition on High-Performance Computing and Networking, pages 987--989, London, UK, 1998. Springer-Verlag. Google ScholarDigital Library
M. Palkovič. Enhanced Applicability of Loop Transformations. PhD thesis, T.U. Eindhoven, The Netherlands, Sept. 2007.Google Scholar
S. Pop, A. Cohen, C. Bastoul, S. Girbal, P. Jouvelot, G.-A. Silber, and N. Vasilache. GRAPHITE: Loop optimizations based on the polyhedral model for GCC. In Proc. of the 4th GCC Developper's Summit, Ottawa, Canada, June 2006.Google Scholar
L.-N. Pouchet, C. Bastoul, J. Cavazos, and A. Cohen. A note on the performance distribution of affine schedules. 2nd Workshop on Statistical and Machine learning approaches to ARchitectures and compilaTion (SMART'08), Göteborg, Sweden, Jan. 2008.Google Scholar
L.-N. Pouchet, C. Bastoul, A. Cohen, and N. Vasilache. Iterative optimization in the polyhedral model: Part I, one-dimensional time. In IEEE/ACM Intl. Symp. on Code Generation and Optimization (CGO'07), pages 144--156, San Jose, CA, USA, Mar. 2007. Google ScholarDigital Library
W. Pugh. The Omega test: a fast and practical integer programming algorithm for dependence analysis. In ACM Intl. Conf. on Supercomputing (ICS'91), pages 4--13, Albuquerque, NM, USA, Aug. 1991. Google ScholarDigital Library
M. Püschel, B. Singer, J. Xiong, J. Moura, J. Johnson, D. Padua, M. Veloso, and R. W. Johnson. SPIRAL: A generator for platform-adapted libraries of signal processing algorithms. J. of High Performance Computing and Applications, special issue on Automatic Performance Tuning, 18(1):21--45, 2004. Google ScholarDigital Library
F. Quilleré, S. Rajopadhye, and D. Wilde. Generation of efficient nested loops from polyhedra. Intl. J. of Parallel Programming, 28(5):469--498, Oct. 2000. Google ScholarDigital Library
L. Renganarayanan, D. Kim, S. Rajopadhye, and M. M. Strout. Parameterized tiled loops for free. SIGPLAN Notices, Proc. of the 2007 PLDI Conf., 42(6):405--414, 2007. Google ScholarDigital Library
A. Schrijver. Theory of Linear and Integer Programming. John Wiley & Sons, 1986. Google ScholarDigital Library
M. Stephenson, S. Amarasinghe, M. Martin, and U.-M. O'Reilly. Meta optimization: improving compiler heuristics with machine learning. SIGPLAN Notices, 38(5):77--90, 2003. Google ScholarDigital Library
S. Triantafyllis, M. Vachharajani, and D. I. August. Compiler optimization-space exploration. In J. of Instruction-level Parallelism, volume 7, Jan. 2005.Google Scholar
N. Vasilache, C. Bastoul, and A. Cohen. Polyhedral code generation in the real world. In Proc. of the Intl. Conf. on Compiler Construction (ETAPS CC 16), volume 3923, pages 185--201, Vienna, Austria, Mar. 2006. Springer-Verlag. Google ScholarDigital Library
N. Vasilache, A. Cohen, and L.-N. Pouchet. Automatic correction of loop transformations. In IEEE Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT'07), pages 292--302, Brasov, Romania, Sept. 2007. Google ScholarDigital Library
F. Vivien. On the optimality of Feautrier's scheduling algorithm. In Intl. Euro-Par Conf. on Parallel Processing (EURO--PAR'02), pages 299--308, London, UK, 2002. Springer-Verlag. Google ScholarDigital Library
D. K. Wilde. A library for doing polyhedral operations. Technical Report 785, IRISA, Rennes, France, 1993.Google Scholar
M. Wolfe. High performance compilers for parallel computing. Addison-Wesley Publishing Company, 1995. Google ScholarDigital Library
J. Xue. Transformations of nested loops with non-convex iteration spaces. Parallel Computing, 22(3):339--368, 1996. Google ScholarDigital Library

Index Terms

Iterative optimization in the polyhedral model: part ii, multidimensional time
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Iterative optimization in the polyhedral model: part ii, multidimensional time
PLDI '08: Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation

High-level loop optimizations are necessary to achieve good performance over a wide variety of processors. Their performance impact can be significant because they involve in-depth program transformations that aim to sustain a balanced workload over the ...
Read More
Single-dimension software pipelining for multidimensional loops

Traditionally, software pipelining is applied either to the innermost loop of a given loop nest or from the innermost loop to outer loops. This paper proposes a three-step approach, called single-dimension software pipelining (SSP), to software pipeline ...
Read More
Affine-by-Statement Transformations of Imperfectly Nested Loops
IPPS '96: Proceedings of the 10th International Parallel Processing Symposium

A majority of loop restructuring techniques developed so far assume that loops are perfectly nested. The unimodular approach unifies three individual transformations -- loop interchange, skewing and reversal -- but is still limited to perfect loop ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGPLAN Notices Volume 43, Issue 6
PLDI '08
June 2008
382 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1379022
Issue’s Table of Contents
PLDI '08: Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2008
396 pages
ISBN:9781595938602
DOI:10.1145/1375581
General Chair:
Rajiv Gupta
University of California, Riverside, USA
,
Program Chair:
Saman Amarasinghe
Massachusetts Institute of Technology, USA
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 June 2008
Check for updates
Author Tags
affine scheduling
genetic algorithm
iterative compilation
loop transformation
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 109
  Total Citations
  View Citations
- 1,092
  Total Downloads
- Downloads (Last 12 months)35
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Iterative optimization in the polyhedral model: part ii, multidimensional time

ACM SIGPLAN Notices

Abstract

References

Cited By

Index Terms

Recommendations

Iterative optimization in the polyhedral model: part ii, multidimensional time

Single-dimension software pipelining for multidimensional loops

Affine-by-Statement Transformations of Imperfectly Nested Loops