skip to main content
research-article

Iterative optimization in the polyhedral model: part ii, multidimensional time

Authors Info & Claims
Published:07 June 2008Publication History
Skip Abstract Section

Abstract

High-level loop optimizations are necessary to achieve good performance over a wide variety of processors. Their performance impact can be significant because they involve in-depth program transformations that aim to sustain a balanced workload over the computational, storage, and communication resources of the target architecture. Therefore, it is mandatory that the compiler accurately models the target architecture as well as the effects of complex code restructuring.

However, because optimizing compilers (1) use simplistic performance models that abstract away many of the complexities of modern architectures, (2) rely on inaccurate dependence analysis, and (3) lack frameworks to express complex interactions of transformation sequences, they typically uncover only a fraction of the peak performance available on many applications. We propose a complete iterative framework to address these issues. We rely on the polyhedral model to construct and traverse a large and expressive search space. This space encompasses only legal, distinct versions resulting from the restructuring of any static control loop nest. We first propose a feedback-driven iterative heuristic tailored to the search space properties of the polyhedral model. Though, it quickly converges to good solutions for small kernels, larger benchmarks containing higher dimensional spaces are more challenging and our heuristic misses opportunities for significant performance improvement. Thus, we introduce the use of a genetic algorithm with specialized operators that leverage the polyhedral representation of program dependences. We provide experimental evidence that the genetic algorithm effectively traverses huge optimization spaces, achieving good performance improvements on large loop nests.

References

  1. F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin, M. F. P. O'Boyle, J. Thomson, M. Toussaint, and C. K. I. Williams. Using machine learning to focus iterative optimization. In IEEE/ACM Intl. Symp. on Code Generation and Optimization (CGO'06), pages 295--305, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. N. Ahmed, N. Mateev, and K. Pingali. Tiling imperfectly-nested loop nests. In ACM/IEEE Conf. on Supercomputing (SC'00), Dallas, TX, USA, Nov. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures. Morgan Kaufmann Publishers, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Barthou, J.-F. Collard, and P. Feautrier. Fuzzy array dataflow analysis. J. of Parallel and Distributed Computing, 40:210--226, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Bastoul. Code generation in the polyhedral model is easier than you think. In IEEE Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT'04), pages 7--16, Juan--les--Pins, France, Sept. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Bastoul and P. Feautrier. Improving data locality by chunking. In Intl. Conf. on Compiler Construction (ETAPS CC 12), volume 2622, pages 320--335, Warsaw, Poland, Apr. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Bernstein. Analysis of programs for parallel processing. IEEE Trans. on Electronic Computers, 15(5):757--763, Oct. 1966.Google ScholarGoogle ScholarCross RefCross Ref
  8. F. Bodin, T. Kisuki, P. M. W. Knijnenburg, M. F. P. O'Boyle, and E. Rohou. Iterative compilation in a non-linear optimisation space. In W. on Profile and Feedback Directed Compilation, Paris, Oct. 1998.Google ScholarGoogle Scholar
  9. U. Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In Intl. Conf. on Compiler Construction (ETAPS CC 17), Budapest, Hungary, Apr. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. A practical automatic polyhedral parallelization and locality optimization system. In ACM SIGPLAN Conf. on Programming Languages Design and Implementation (PLDI'08), Tucson, AZ, USA, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. K. D. Cooper, A. Grosul, T. J. Harvey, S. Reeves, D. Subramanian, L. Torczon, and T. Waterman. ACME: adaptive compilation made efficient. In ACM SIGLPAN/SIGBED Conf. on Languages, Compilers, and Tools for Embedded Systems (LCTES'05), pages 69--77, Chicago, IL, USA, 2005. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. K. D. Cooper, P. J. Schielke, and D. Subramanian. Optimizing for reduced code space using genetic algorithms. In Workshop on Languages, Compilers, and Tools for Embedded Systems, pages 1--9, Atlanta, GA, USA, July 1999. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. K. D. Cooper, D. Subramanian, and L. Torczon. Adaptive optimizing compilers for the 21st century. J. Supercomputing, 23(1):7--22, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Darte, Y. Robert, and F. Vivien. Scheduling and Automatic Parallelization. Birkhauser, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. Feautrier. Parametric integer programming. RAIRO Recherche Opérationnelle, 22(3):243--268, 1988.Google ScholarGoogle ScholarCross RefCross Ref
  16. P. Feautrier. Some efficient solutions to the affine scheduling problem, part I: one dimensional time. Intl. J. of Parallel Programming, 21(5):313--348, Oct. 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. Feautrier. Some efficient solutions to the affine scheduling problem, part II: multidimensional time. Intl. J. of Parallel Programming, 21(6):389--420, Dec. 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Girbal, N. Vasilache, C. Bastoul, A. Cohen, D. Parello, M. Sigler, and O. Temam. Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies. Intl. J. of Parallel Programming, 34(3), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. E. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Longman Publishing Co. Inc., Boston, MA, USA, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Haneda, P. M. W. Knijnenburg, and H. A. G. Wijshoff. Automatic selection of compiler options using non-parametric inferential statistics. In IEEE Intl.\ Conf.\ on Parallel Architectures and Compilation Techniques (PACT'05), pages 123--132, Saint Louis, MO, USA, 2005. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. W. Kelly. Optimization within a Unified Transformation Framework. PhD thesis, Univ. of Maryland, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. W. Kelly, W. Pugh, and E. Rosser. Code generation for multiple mappings. In Intl. Symp. on the frontiers of massively parallel computation, pages 332--341, McLean, VA, USA, Feb. 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. Kisuki, P. M. W. Knijnenburg, and M. F. P. O'Boyle. Combined selection of tile sizes and unroll factors using iterative compilation. In IEEE Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT'00), pages 237--246, Philadelphia, PA, USA, 2000. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. P. A. Kulkarni, S. R. Hines, D. B. Whalley, J. D. Hiser, J. W. Davidson, and D. L. Jones. Fast and efficient searches for effective optimization-phase sequences. ACM Trans. on Architecture and Code Optimization, 2(2):165--198, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Le Fur. Scanning parameterized polyhedron using Fourier-Motzkin elimination. Concurrency -- Practice and Experience, 8(6):445--460, 1996.Google ScholarGoogle Scholar
  26. C. Lee. UTDSP benchmark suite, 1998. http://www.eecg.toronto.edu/char‘ corinna/DSP.Google ScholarGoogle Scholar
  27. A. W. Lim and M. S. Lam. Maximizing parallelism and minimizing synchronization with affine transforms. In ACM Symp. on Principles of Programming Languages (PoPL'97), pages 201--214, Paris, France, 1997. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Long and G. Fursin. Systematic search within an optimisation space based on unified transformation framework. IJCSE Intl. J. of Computational Science and Engineering, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Long and M. O'Boyle. Adaptive Java optimisation using instance-based learning. In ACM Intl. Conf. on Supercomputing (ICS'04), pages 237--246, Saint-Malo, France, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. Nisbet. GAPS: A compiler framework for genetic algorithm (GA) optimised parallelisation. In HPCN Europe 1998: Proc. of the Intl. Conf. and Exhibition on High-Performance Computing and Networking, pages 987--989, London, UK, 1998. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M. Palkovič. Enhanced Applicability of Loop Transformations. PhD thesis, T.U. Eindhoven, The Netherlands, Sept. 2007.Google ScholarGoogle Scholar
  32. S. Pop, A. Cohen, C. Bastoul, S. Girbal, P. Jouvelot, G.-A. Silber, and N. Vasilache. GRAPHITE: Loop optimizations based on the polyhedral model for GCC. In Proc. of the 4th GCC Developper's Summit, Ottawa, Canada, June 2006.Google ScholarGoogle Scholar
  33. L.-N. Pouchet, C. Bastoul, J. Cavazos, and A. Cohen. A note on the performance distribution of affine schedules. 2nd Workshop on Statistical and Machine learning approaches to ARchitectures and compilaTion (SMART'08), Göteborg, Sweden, Jan. 2008.Google ScholarGoogle Scholar
  34. L.-N. Pouchet, C. Bastoul, A. Cohen, and N. Vasilache. Iterative optimization in the polyhedral model: Part I, one-dimensional time. In IEEE/ACM Intl. Symp. on Code Generation and Optimization (CGO'07), pages 144--156, San Jose, CA, USA, Mar. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. W. Pugh. The Omega test: a fast and practical integer programming algorithm for dependence analysis. In ACM Intl. Conf. on Supercomputing (ICS'91), pages 4--13, Albuquerque, NM, USA, Aug. 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. Püschel, B. Singer, J. Xiong, J. Moura, J. Johnson, D. Padua, M. Veloso, and R. W. Johnson. SPIRAL: A generator for platform-adapted libraries of signal processing algorithms. J. of High Performance Computing and Applications, special issue on Automatic Performance Tuning, 18(1):21--45, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. F. Quilleré, S. Rajopadhye, and D. Wilde. Generation of efficient nested loops from polyhedra. Intl. J. of Parallel Programming, 28(5):469--498, Oct. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. L. Renganarayanan, D. Kim, S. Rajopadhye, and M. M. Strout. Parameterized tiled loops for free. SIGPLAN Notices, Proc. of the 2007 PLDI Conf., 42(6):405--414, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. A. Schrijver. Theory of Linear and Integer Programming. John Wiley & Sons, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. M. Stephenson, S. Amarasinghe, M. Martin, and U.-M. O'Reilly. Meta optimization: improving compiler heuristics with machine learning. SIGPLAN Notices, 38(5):77--90, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. S. Triantafyllis, M. Vachharajani, and D. I. August. Compiler optimization-space exploration. In J. of Instruction-level Parallelism, volume 7, Jan. 2005.Google ScholarGoogle Scholar
  42. N. Vasilache, C. Bastoul, and A. Cohen. Polyhedral code generation in the real world. In Proc. of the Intl. Conf. on Compiler Construction (ETAPS CC 16), volume 3923, pages 185--201, Vienna, Austria, Mar. 2006. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. N. Vasilache, A. Cohen, and L.-N. Pouchet. Automatic correction of loop transformations. In IEEE Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT'07), pages 292--302, Brasov, Romania, Sept. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. F. Vivien. On the optimality of Feautrier's scheduling algorithm. In Intl. Euro-Par Conf. on Parallel Processing (EURO--PAR'02), pages 299--308, London, UK, 2002. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. D. K. Wilde. A library for doing polyhedral operations. Technical Report 785, IRISA, Rennes, France, 1993.Google ScholarGoogle Scholar
  46. M. Wolfe. High performance compilers for parallel computing. Addison-Wesley Publishing Company, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. J. Xue. Transformations of nested loops with non-convex iteration spaces. Parallel Computing, 22(3):339--368, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Iterative optimization in the polyhedral model: part ii, multidimensional time

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 43, Issue 6
      PLDI '08
      June 2008
      382 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/1379022
      Issue’s Table of Contents
      • cover image ACM Conferences
        PLDI '08: Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation
        June 2008
        396 pages
        ISBN:9781595938602
        DOI:10.1145/1375581
        • General Chair:
        • Rajiv Gupta,
        • Program Chair:
        • Saman Amarasinghe

      Copyright © 2008 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 7 June 2008

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader