Abstract
Runtime code optimization and speculative execution are becoming increasingly prominent to leverage performance in the current multi- and many-core era. However, a wider and more efficient use of such techniques is mainly hampered by the prohibitive time overhead induced by centralized data race detection, dynamic code behavior modeling, and code generation. Most of the existing Thread Level Speculation (TLS) systems rely on naively slicing the target loops into chunks and trying to execute the chunks in parallel with the help of a centralized performance-penalizing verification module that takes care of data races. Due to the lack of a data dependence model, these speculative systems are not capable of doing advanced transformations, and, more importantly, the chances of rollback are high. The polyhedral model is a well-known mathematical model to analyze and optimize loop nests. The current state-of-art tools limit the application of the polyhedral model to static control codes. Thus, none of these tools can generally handle codes with while loops, indirect memory accesses, or pointers. Apollo (Automatic POLyhedral Loop Optimizer) is a framework that goes one step beyond and applies the polyhedral model dynamically by using TLS. Apollo can predict, at runtime, whether the codes are behaving linearly or not, and it applies polyhedral transformations on-the-fly. This article presents a novel system that enables Apollo to handle codes whose memory accesses and loop bounds are not necessarily linear. More generally, this approach expands the applicability of the polyhedral model at runtime to a wider class of codes. Plugging together both linear and nonlinear accesses to the dependence prediction model enables the application of polyhedral loop optimizing transformations even for nonlinear code kernels while also allowing a low-cost speculation verification.
Supplemental Material
Available for Download
Slide deck associated with this paper
- U. Banerjee. 1993. Loop Transformations for Restructuring Compilers - The Foundations. Kluwer Academic Publishers. Google ScholarDigital Library
- Kevin Barker, Thomas Benson, Dan Campbell, David Ediger, Roberto Gioiosa, Adolfy Hoisie, Darren Kerbyson, Joseph Manzano, Andres Marquez, Leon Song, Nathan Tallent, and Antonino Tumeo. 2013. PERFECT (Power Efficiency Revolution for Embedded Computing Technologies) Benchmark Suite Manual. Pacific Northwest National Laboratory and Georgia Tech Research Institute. http://hpc.pnnl.gov/projects/PERFECT/.Google Scholar
- Emery D. Berger and Benjamin G. Zorn. 2006. DieHard: Probabilistic memory safety for unsafe languages. In Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'06). ACM, New York, NY, USA, 158--168. DOI:http://dx.doi.org/10.1145/1133981.1134000 Google ScholarDigital Library
- Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. 2008. A practical automatic polyhedral parallelizer and locality optimizer. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'08). ACM, New York, NY, USA, 101--113. DOI:http://dx.doi.org/10.1145/1375581.1375595 Google ScholarDigital Library
- Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In IISWC. IEEE, 44--54. Google ScholarDigital Library
- Jacob Cohen, Patricia Cohen, Stephen G. West, and Leona S. Aiken. 2002. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences (3rd ed.). Routledge.Google Scholar
- Jean-François Collard. 1995. Automatic parallelization of while-loops using speculative execution. International Journal of Parallel Programming 23, 2 (April 1995), 191--219. DOI:http://dx.doi.org/10.1007/BF02577789 Google ScholarDigital Library
- Jean-François Collard, Denis Barthou, and Paul Feautrier. 1995. Fuzzy array dataflow analysis. SIGPLAN Not. 30, 8 (Aug. 1995), 92--101. DOI:http://dx.doi.org/10.1145/209937.209947 Google ScholarDigital Library
- Paul Feautrier and Christian Lengauer. 2011. Polyhedron model. In Encyclopedia of Parallel Computing, David Padua (Ed.). Springer US, 1581--1592. DOI:http://dx.doi.org/10.1007/978-0-387-09766-4_502Google Scholar
- Grigori Fursin and Olivier Temam. 2010. Collective optimization: A practical collaborative approach. ACM Transactions on Architecture and Code Optimization 7, 4, Article 20 (Dec. 2010), 29 pages. DOI:http://dx.doi.org/10.1145/1880043.1880047 Google ScholarDigital Library
- Stefan J. Geuns, Marco J. G. Bekooij, Tjerk Bijlsma, and Henk Corporaal. 2011. Parallelization of while loops in nested loop programs for shared-memory multiprocessor systems. In Design, Automation & Test in Europe Conference & Exhibition, DATE 2011. IEEE Computer Society, 1--6. http://doc.utwente.nl/78154/Google Scholar
- Martin Griebl and Jean-Francois Collard. 1995. Generation of synchronous code for automatic parallelization of while loops. In Proceedings of the Euro-Par’95 Parallel Processing, First International Euro-Par Conference, Stockholm, Sweden, August 29-31, 1995. 315--326. DOI:http://dx.doi.org/10.1007/BFb0020474 Google ScholarDigital Library
- Alexandra Jimborean, Philippe Clauss, Jean-François Dollinger, Vincent Loechner, and Martinez Juan Manuel. 2014. Dynamic and speculative polyhedral parallelization using compiler-generated skeletons. International Journal of Parallel Programming 42, 4 (Aug. 2014), 529--545. Google ScholarDigital Library
- Troy A. Johnson, Rudolf Eigenmann, and T. N. Vijaykumar. 2007. Speculative thread decomposition through empirical optimization. In Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'07). ACM, New York, NY, USA, 205--214. DOI:http://dx.doi.org/10.1145/1229428.1229474 Google ScholarDigital Library
- Christian Lengauer and Martin Griebl. 1994. On the Parallelization of Loop Nests Containing While Loops. Technical Report MIP-9414. Universitt Passau (DE). http://opac.inria.fr/record=b1040396Google Scholar
- Wei Liu, James Tuck, Luis Ceze, Wonsun Ahn, Karin Strauss, Jose Renau, and Josep Torrellas. 2006. POSH: A TLS compiler that exploits program structure. In Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'06). ACM, New York, NY, USA, 158--167. DOI:http://dx.doi.org/10.1145/1122971.1122997 Google ScholarDigital Library
- Gene Novark and Emery D. Berger. 2010. DieHarder: Securing the heap. In Proceedings of the 17th ACM Conference on Computer and Communications Security (CCS’10). ACM, New York, NY, USA, 573--584. DOI:http://dx.doi.org/10.1145/1866307.1866371 Google ScholarDigital Library
- Cosmin E. Oancea, Alan Mycroft, and Tim Harris. 2009. A lightweight in-place implementation for software thread-level speculation. In Proceedings of the Twenty-first Annual Symposium on Parallelism in Algorithms and Architectures (SPAA'09). ACM, New York, NY, USA, 223--232. DOI:http://dx.doi.org/10.1145/1583991.1584050 Google ScholarDigital Library
- Arun Raman, Hanjun Kim, Thomas R. Mason, Thomas B. Jablin, and David I. August. 2010. Speculative parallelization using software multi-threaded transactions. In ACM SIGARCH Computer Architecture News, 38, 1 (March 2010), 65--76. DOI:http://dx.doi.org/10.1145/1735970.1736030 Google ScholarDigital Library
- Easwaran Raman, Neil Va hharajani, Ram Rangan, and David I. August. 2008. Spice: Speculative parallel iteration chunk execution. In Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO'08). ACM, New York, NY, USA, 175--184. DOI:http://dx.doi.org/10.1145/1356058.1356082 Google ScholarDigital Library
- Lawrence Rauchwerger and David Padua. 1995. The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. In Proceedings of the ACM SIGPLAN 1995 Conference on Programming Language Design and Implementation (PLDI'95). ACM, New York, NY, USA, 218--232. DOI:http://dx.doi.org/10.1145/207110.207148 Google ScholarDigital Library
- Mahesh Ravishankar, John Eisenlohr, Louis-Noël Pouchet, J. Ramanujam, Atanas Rountev, and P. Sadayappan. 2012. Code generation for parallel execution of a class of irregular loops on distributed memory systems. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC’12). IEEE Computer Society Press, Los Alamitos, CA, USA. Google ScholarDigital Library
- Kevin Streit, Clemens Hammacher, Andreas Zeller, and Sebastian Hack. 2013. Sambamba: Runtime adaptive parallel execution. In Proceedings of the 3rd International Workshop on Adaptive Self-Tuning Computing Systems (ADAPT'13). ACM, New York, NY, USA, Article 7, 6 pages. DOI:http://dx.doi.org/10.1145/2484904.2484911 Google ScholarDigital Library
- Aravind Sukumaran-Rajam, Juan Manuel Martinez, Willy Wolff, Alexandra Jimborean, and Philippe Clauss. 2014. Speculative program parallelization with scalable and decentralized runtime verification. In Runtime Verification, Borzoo Bonakdarpour and Scott A. Smolka (Eds.), Vol. 8734. Springer, Toronto, Canada, 124--139. DOI:http://dx.doi.org/10.1007/978-3-319-11164-3_11Google Scholar
- Harmen L. A. van der Spek, Erwin M. Bakker, and Harry A. G. Wijshoff. 2008. SPARK00: A benchmark package for the compiler evaluation of irregular/sparse codes. CoRR abs/0805.3897 (2008).Google Scholar
- Anand Venkat, Manu Shantharam, Mary Hall, and Michelle Mills Strout. 2014. Non-affine extensions to polyhedral code generation. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO’14). ACM, New York, NY, USA, Article 185, 10 pages. DOI:http://dx.doi.org/10.1145/2544137.2544141 Google ScholarDigital Library
- Sven Verdoolaege, Juan Carlos Juega, Albert Cohen, José Ignacio Gómez, Christian Tenllado, and Francky Catthoor. 2013. Polyhedral parallel code generation for CUDA. ACM Transactions on Architecture and Code Optimization 9, 4, Article 54 (Jan. 2013), 23 pages. DOI:http://dx.doi.org/10.1145/2400682.2400713 Google ScholarDigital Library
Index Terms
- The Polyhedral Model of Nonlinear Loops
Recommendations
Non-affine Extensions to Polyhedral Code Generation
CGO '14: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and OptimizationThis paper describes a loop transformation framework that extends a polyhedral representation of loop nests to represent and transform computations with non-affine index arrays in loop bounds and subscripts via a new interface between compile-time and ...
A polyhedral compilation framework for loops with dynamic data-dependent bounds
CC 2018: Proceedings of the 27th International Conference on Compiler ConstructionWe study the parallelizing compilation and loop nest optimization of an important class of programs where counted loops have a dynamic data-dependent upper bound. Such loops are amenable to a wider set of transformations than general while loops with ...
Automatic speculative parallelization of loops using polyhedral dependence analysis
COSMIC '13: Proceedings of the First International Workshop on Code OptimiSation for MultI and many CoresSpeculative Execution (SE) runs loops in parallel even in the presence of a dependence. Using polyhedral dependence analysis, more speculation candidate loops can be discovered than normal OpenMP parallelization. In this research, a framework is ...
Comments