Abstract
The idea of decomposed software pipelining is to decouple the software pipelining problem into a cyclic scheduling problem without resource constraints and an acyclic scheduling problem with resource constraints. In terms of loop transformation and code motion, the technique can be formulated as a combination of loop shifting and loop compaction. Loop shifting amounts to moving statements between iterations thereby changing some loop independent dependences into loop carried dependences and vice versa. Then, loop compaction schedules the body of the loop considering only loop independent dependences, but taking into account the details of the target architecture. In this paper, we show how loop shifting can be optimized so as to minimize both the length of the critical path and the number of dependences for loop compaction. The first problem is well-known and can be solved by an algorithm due to Leiserson and Saxe. We show that the second optimization (and the combination with the first one) is also polynomially solvable with a fast graph algorithm, variant of minimum-cost flow algorithms. Finally, we analyze the improvements obtained on loop compaction by experiments on random graphs.
Similar content being viewed by others
REFERENCES
John L. Hennessy and David A. Patterson, Computer Architecture: A Quantitative Approach, 2nd ed., Chap. 4, Morgan-Kaufmann (1996).
Carole Dulong, The IA-64 architecture at work, Computer, 31(7):24–32 (July 1998).
Vicki H. Allan, Reese B. Jones, Randall M. Lee, and Stephen J. Allan, Software pipelining, ACM Computing Surveys, 27(3):367–432 (September 1995).
Monica S. Lam, Software pipelining: An effective scheduling technique for VLIW machines, SIGPLAN'88 Conf. Progr. Lang. Design and Implementation, ACM Press, Atlanta, Georgia, pp. 318–328 (1988).
B. R. Rau and C. D. Glaeser, Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing, Proc. 14th Ann. Workshop of Microprogramming, pp. 183–198 (October 1981).
B. R. Rau, Iterative modulo scheduling, IJPP, 24(1):3-64 (1996).
R. A. Huff, Lifetime-sensitive modulo scheduling, Conf. Progr. Lang. Design and Implementation (PLDI'93), ACM, pp. 258–267 (1993).
J. Llosa, A. González, E. Ayguadé, and M. Valero, Swing modulo scheduling: A lifetime-sensitive approach, Conf. Parallel Architectures and Compilation Techniques (PACT'96), IEEE Computer Society Press, Boston, Massachusetts (1996).
Alexander Aiken and Alexandru Nicolau, Perfect pipelining: A new loop optimization technique, European Symp. Programming, Vol. 300, Lecture Notes in Computer Science, Springer-Verlag, pp. 221–235 (1988).
M. Rajagopalan and V. H. Allan, Specification of software pipelining using petri nets, IJPP, 22(3):273–301 (1994).
Suneel Jain, Circular scheduling, Conf. Progr. Lang. Design and Implementation (PLDI'91), ACM, pp. 219–228 (1991).
Soo-Mook Moon and Kemal Ebcioğlu, An efficient resource-constrained global scheduling technique for superscalar and VLIW processors, 25th Ann. Int'l. Symp. Microarchitecture, pp. 55–71 (1992).
L.-F. Chao, A. LaPaugh, and E. Sha, Rotation scheduling: A loop pipelining algorithm, 30th ACM-IEEE Design Automation Conf., pp. 566–572 (1993).
F. Gasperoni and U. Schwiegelshohn, Generating close to optimum loop schedules on parallel processors, Parallel Proc. Lett., 4(4):391–403 (1994).
J. Wang, C. Eisenbeis, M. Jourdan, and B. Su, Decomposed software pipelining, IJPP, 22(3):351–373 (1994).
P.-Y. Calland, A. Darte, and Y. Robert, Circuit retiming applied to decomposed software pipelining, IEEE Trans. Parallel Distrib. Syst., 9(1):24–35 (January 1998).
U. Schwiegelshohn, F. Gasperoni, and K. Ebcioğlu, On optimal parallelization of arbitrary loops, Journal of Parallel and Distributed Computing, 11:130–134 (1991).
M. Gondran and M. Minoux, Graphs and Algorithms, John Wiley (1984).
C. Hanen and A. Munier, Cyclic scheduling on parallel processors: An overview. In P. Chrétienne, E. G. Coffman, Jr., J. K. Lenstra, and Z. Liu (eds.), Scheduling Theory and Its Applications, John Wiley (1995).
E. G. Coffman, Jr., Computer and Job-Shop Scheduling Theory, John Wiley (1976).
Myricom, Inc. LANai 3.0 instruction set. Electronic document http://www.myricom.com/scs/L3/doc/inst_toc.html.
Ping Hu, Ordonnancement modulo par recouvrement, 10è me Rencontres du Parallélisme (RenPar'10), Strasbourg, France (June 1998).
C. E. Leiserson and J. B. Saxe, Retiming synchronous circuitry, Algorithmica, 6(1):5–35 (1991).
Tsing-Fa Lee Allen, C.-H. Wu Wei-Jeng Chen, Wei-Kai Cheng, and Youn-Long Lin, On the relationship between sequential logic retiming and loop folding, Proc. SASIMI'93, Nara, Japan, pp. 384–393 (October 1993).
Alain Darte, Georges-André Silber, and Frédéric Vivien, Combining retiming and scheduling techniques for loop parallelization and loop tiling, Parallel Proc. Lett., 7(4):379–392 (1997).
F. Gasperoni and U. Schwiegelshohn, Transforming cyclic scheduling problems into acyclic ones. In P. Chrétienne, E. G. Coffman, Jr., J. K. Lenstra, and Z. Liu (eds.), Scheduling Theory and Its Applications, John Wiley, pp. 241–258 (1995).
Trimaran, An infrastructure for research in instruction level parallelism. Electronic document http://www.trimaran.org.
Salto, Salto: System of assembly language transformation and optimization. Electronic document http://www.irisa.fr/caps/projects/Salto/.
A. Eichenberger, E. S. Davidson, and S. G. Abraham, Minimum register requirements for a modulo schedule, Proc. 27th Int'l. Symp. Microarchitecture, San Jose, California, pp. 75–84 (1994).
Antoine Sawaya, Pipeline logiciel: Découplage et contraintes de registres, Ph.D. thesis, Université de Versailles, France (1997).
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Darte, A., Huard, G. Loop Shifting for Loop Compaction. International Journal of Parallel Programming 28, 499–534 (2000). https://doi.org/10.1023/A:1007506711786
Issue Date:
DOI: https://doi.org/10.1023/A:1007506711786