skip to main content
research-article
Free Access

The Polyhedral Model of Nonlinear Loops

Authors Info & Claims
Published:08 December 2015Publication History
Skip Abstract Section

Abstract

Runtime code optimization and speculative execution are becoming increasingly prominent to leverage performance in the current multi- and many-core era. However, a wider and more efficient use of such techniques is mainly hampered by the prohibitive time overhead induced by centralized data race detection, dynamic code behavior modeling, and code generation. Most of the existing Thread Level Speculation (TLS) systems rely on naively slicing the target loops into chunks and trying to execute the chunks in parallel with the help of a centralized performance-penalizing verification module that takes care of data races. Due to the lack of a data dependence model, these speculative systems are not capable of doing advanced transformations, and, more importantly, the chances of rollback are high. The polyhedral model is a well-known mathematical model to analyze and optimize loop nests. The current state-of-art tools limit the application of the polyhedral model to static control codes. Thus, none of these tools can generally handle codes with while loops, indirect memory accesses, or pointers. Apollo (Automatic POLyhedral Loop Optimizer) is a framework that goes one step beyond and applies the polyhedral model dynamically by using TLS. Apollo can predict, at runtime, whether the codes are behaving linearly or not, and it applies polyhedral transformations on-the-fly. This article presents a novel system that enables Apollo to handle codes whose memory accesses and loop bounds are not necessarily linear. More generally, this approach expands the applicability of the polyhedral model at runtime to a wider class of codes. Plugging together both linear and nonlinear accesses to the dependence prediction model enables the application of polyhedral loop optimizing transformations even for nonlinear code kernels while also allowing a low-cost speculation verification.

Skip Supplemental Material Section

Supplemental Material

References

  1. U. Banerjee. 1993. Loop Transformations for Restructuring Compilers - The Foundations. Kluwer Academic Publishers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Kevin Barker, Thomas Benson, Dan Campbell, David Ediger, Roberto Gioiosa, Adolfy Hoisie, Darren Kerbyson, Joseph Manzano, Andres Marquez, Leon Song, Nathan Tallent, and Antonino Tumeo. 2013. PERFECT (Power Efficiency Revolution for Embedded Computing Technologies) Benchmark Suite Manual. Pacific Northwest National Laboratory and Georgia Tech Research Institute. http://hpc.pnnl.gov/projects/PERFECT/.Google ScholarGoogle Scholar
  3. Emery D. Berger and Benjamin G. Zorn. 2006. DieHard: Probabilistic memory safety for unsafe languages. In Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'06). ACM, New York, NY, USA, 158--168. DOI:http://dx.doi.org/10.1145/1133981.1134000 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. 2008. A practical automatic polyhedral parallelizer and locality optimizer. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'08). ACM, New York, NY, USA, 101--113. DOI:http://dx.doi.org/10.1145/1375581.1375595 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In IISWC. IEEE, 44--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Jacob Cohen, Patricia Cohen, Stephen G. West, and Leona S. Aiken. 2002. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences (3rd ed.). Routledge.Google ScholarGoogle Scholar
  7. Jean-François Collard. 1995. Automatic parallelization of while-loops using speculative execution. International Journal of Parallel Programming 23, 2 (April 1995), 191--219. DOI:http://dx.doi.org/10.1007/BF02577789 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jean-François Collard, Denis Barthou, and Paul Feautrier. 1995. Fuzzy array dataflow analysis. SIGPLAN Not. 30, 8 (Aug. 1995), 92--101. DOI:http://dx.doi.org/10.1145/209937.209947 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Paul Feautrier and Christian Lengauer. 2011. Polyhedron model. In Encyclopedia of Parallel Computing, David Padua (Ed.). Springer US, 1581--1592. DOI:http://dx.doi.org/10.1007/978-0-387-09766-4_502Google ScholarGoogle Scholar
  10. Grigori Fursin and Olivier Temam. 2010. Collective optimization: A practical collaborative approach. ACM Transactions on Architecture and Code Optimization 7, 4, Article 20 (Dec. 2010), 29 pages. DOI:http://dx.doi.org/10.1145/1880043.1880047 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Stefan J. Geuns, Marco J. G. Bekooij, Tjerk Bijlsma, and Henk Corporaal. 2011. Parallelization of while loops in nested loop programs for shared-memory multiprocessor systems. In Design, Automation & Test in Europe Conference & Exhibition, DATE 2011. IEEE Computer Society, 1--6. http://doc.utwente.nl/78154/Google ScholarGoogle Scholar
  12. Martin Griebl and Jean-Francois Collard. 1995. Generation of synchronous code for automatic parallelization of while loops. In Proceedings of the Euro-Par’95 Parallel Processing, First International Euro-Par Conference, Stockholm, Sweden, August 29-31, 1995. 315--326. DOI:http://dx.doi.org/10.1007/BFb0020474 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Alexandra Jimborean, Philippe Clauss, Jean-François Dollinger, Vincent Loechner, and Martinez Juan Manuel. 2014. Dynamic and speculative polyhedral parallelization using compiler-generated skeletons. International Journal of Parallel Programming 42, 4 (Aug. 2014), 529--545. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Troy A. Johnson, Rudolf Eigenmann, and T. N. Vijaykumar. 2007. Speculative thread decomposition through empirical optimization. In Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'07). ACM, New York, NY, USA, 205--214. DOI:http://dx.doi.org/10.1145/1229428.1229474 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Christian Lengauer and Martin Griebl. 1994. On the Parallelization of Loop Nests Containing While Loops. Technical Report MIP-9414. Universitt Passau (DE). http://opac.inria.fr/record=b1040396Google ScholarGoogle Scholar
  16. Wei Liu, James Tuck, Luis Ceze, Wonsun Ahn, Karin Strauss, Jose Renau, and Josep Torrellas. 2006. POSH: A TLS compiler that exploits program structure. In Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'06). ACM, New York, NY, USA, 158--167. DOI:http://dx.doi.org/10.1145/1122971.1122997 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Gene Novark and Emery D. Berger. 2010. DieHarder: Securing the heap. In Proceedings of the 17th ACM Conference on Computer and Communications Security (CCS’10). ACM, New York, NY, USA, 573--584. DOI:http://dx.doi.org/10.1145/1866307.1866371 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Cosmin E. Oancea, Alan Mycroft, and Tim Harris. 2009. A lightweight in-place implementation for software thread-level speculation. In Proceedings of the Twenty-first Annual Symposium on Parallelism in Algorithms and Architectures (SPAA'09). ACM, New York, NY, USA, 223--232. DOI:http://dx.doi.org/10.1145/1583991.1584050 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Arun Raman, Hanjun Kim, Thomas R. Mason, Thomas B. Jablin, and David I. August. 2010. Speculative parallelization using software multi-threaded transactions. In ACM SIGARCH Computer Architecture News, 38, 1 (March 2010), 65--76. DOI:http://dx.doi.org/10.1145/1735970.1736030 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Easwaran Raman, Neil Va hharajani, Ram Rangan, and David I. August. 2008. Spice: Speculative parallel iteration chunk execution. In Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO'08). ACM, New York, NY, USA, 175--184. DOI:http://dx.doi.org/10.1145/1356058.1356082 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Lawrence Rauchwerger and David Padua. 1995. The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. In Proceedings of the ACM SIGPLAN 1995 Conference on Programming Language Design and Implementation (PLDI'95). ACM, New York, NY, USA, 218--232. DOI:http://dx.doi.org/10.1145/207110.207148 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Mahesh Ravishankar, John Eisenlohr, Louis-Noël Pouchet, J. Ramanujam, Atanas Rountev, and P. Sadayappan. 2012. Code generation for parallel execution of a class of irregular loops on distributed memory systems. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC’12). IEEE Computer Society Press, Los Alamitos, CA, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Kevin Streit, Clemens Hammacher, Andreas Zeller, and Sebastian Hack. 2013. Sambamba: Runtime adaptive parallel execution. In Proceedings of the 3rd International Workshop on Adaptive Self-Tuning Computing Systems (ADAPT'13). ACM, New York, NY, USA, Article 7, 6 pages. DOI:http://dx.doi.org/10.1145/2484904.2484911 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Aravind Sukumaran-Rajam, Juan Manuel Martinez, Willy Wolff, Alexandra Jimborean, and Philippe Clauss. 2014. Speculative program parallelization with scalable and decentralized runtime verification. In Runtime Verification, Borzoo Bonakdarpour and Scott A. Smolka (Eds.), Vol. 8734. Springer, Toronto, Canada, 124--139. DOI:http://dx.doi.org/10.1007/978-3-319-11164-3_11Google ScholarGoogle Scholar
  25. Harmen L. A. van der Spek, Erwin M. Bakker, and Harry A. G. Wijshoff. 2008. SPARK00: A benchmark package for the compiler evaluation of irregular/sparse codes. CoRR abs/0805.3897 (2008).Google ScholarGoogle Scholar
  26. Anand Venkat, Manu Shantharam, Mary Hall, and Michelle Mills Strout. 2014. Non-affine extensions to polyhedral code generation. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO’14). ACM, New York, NY, USA, Article 185, 10 pages. DOI:http://dx.doi.org/10.1145/2544137.2544141 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Sven Verdoolaege, Juan Carlos Juega, Albert Cohen, José Ignacio Gómez, Christian Tenllado, and Francky Catthoor. 2013. Polyhedral parallel code generation for CUDA. ACM Transactions on Architecture and Code Optimization 9, 4, Article 54 (Jan. 2013), 23 pages. DOI:http://dx.doi.org/10.1145/2400682.2400713 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The Polyhedral Model of Nonlinear Loops

    Recommendations

    Reviews

    William M. Waite

    The polyhedral model is a technique for optimizing loop nests in a program. Each calculated value is modeled by a point, and the dependencies among the values from one iteration to the next are modeled by vectors connecting the points. The result is an object called a polytope that lies in an n -dimensional space, where n is the number of loops. Each point of the polytope has the values of the loop indexes as its coordinates in that space. Performance improvements can be made by applying certain kinds of geometrical transformations to the polytope and then converting the result back to code. Analysis is carried out at compile time, based on static properties of the program, and must therefore make conservative assumptions. Previous papers proposed a framework to support dynamic analysis and speculative execution of the model. Here, Sukumaran-Rajam and Clauss show how to remove some of the constraints that the model places on the behavior of loop variables and the form of index expressions. After introducing the problem, the authors discuss the polyhedral model and its limitations. They then provide the architecture of Apollo, the system discussed in earlier papers, and explain how they extend it to remove its constraints. The paper concludes with a review of related work and some results obtained by applying the extended framework to programs selected from several standard benchmark suites. The paper is well written but quite dense; I would not recommend it to a novice in the optimization area. A reader is expected to understand multicore optimization techniques in general and the polyhedral model in particular. There is a good set of references to help access this material, and the examples are well chosen. A person familiar with the area should have little difficulty understanding and evaluating the authors' contribution. Online Computing Reviews Service

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Architecture and Code Optimization
      ACM Transactions on Architecture and Code Optimization  Volume 12, Issue 4
      January 2016
      848 pages
      ISSN:1544-3566
      EISSN:1544-3973
      DOI:10.1145/2836331
      Issue’s Table of Contents

      Copyright © 2015 ACM

      © 2015 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 8 December 2015
      • Revised: 1 October 2015
      • Accepted: 1 October 2015
      • Received: 1 May 2015
      Published in taco Volume 12, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader