Backtracking-Based Instruction Scheduling to Fill Branch Delay Slots

Baev, Ivan D.; Meleis, Waleed M.; Abraham, Santosh G.

doi:10.1023/A:1020601110391

Backtracking-Based Instruction Scheduling to Fill Branch Delay Slots

Published: December 2002

Volume 30, pages 397–418, (2002)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Ivan D. Baev¹,
Waleed M. Meleis² &
Santosh G. Abraham³

73 Accesses
3 Altmetric
Explore all metrics

Abstract

Conventional schedulers schedule operations in dependence order and never revisit or undo a scheduling decision on any operation. In contrast, backtracking schedulers may unschedule operations and can often generate better schedules. This paper develops and evaluates the backtracking approach to fill branch delay slots. We first present the structure of a generic backtracking scheduling algorithm and prove that it terminates. We then describe two more aggressive backtracking schedulers and evaluate their effectiveness. We conclude that aggressive backtracking-based instruction schedulers can effectively improve schedule quality by eliminating branch delay slots with a small amount of additional computation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Guided Search for Races Based on Data Flow Patterns

An optimizing pipeline stall reduction algorithm for power and performance on multi-core CPUs

Article Open access 29 January 2015

BATAGE-BFNP: A High-Performance Hybrid Branch Predictor with Data-Dependent Branches Speculative Pre-execution for RISC-V Processors

Article 10 January 2023

REFERENCES

M. S. Schlansker and B. R. Rau, EPIC: Explicitly Parallel Instruction Computing, Computer, 33(2):37–45 (2000).
Google Scholar
S. Muchnick, Advanced Compiler Design and Implementation, Morgan Kaufmann (1997).
P. B. Gibbons and S. S. Muchnick. Efficient Instruction Scheduling for a Pipelined Architecture, in ACM SIGPLAN Symposium on Compiler Construction (1986).
H. S. Warren, Instruction Scheduling for the IBM RISC System/6000 Processor, IBM Journal of Research and Development, 34(1):85–92 (1990).
Google Scholar
J. C. Dehnert and R. A. Toole, Compiling for the Cydra-5, J. Supercomput., 7(1/2): 181–227 (1993).
Google Scholar
M. Rim and R. Jain, Lower-bound Performance Estimation for the High-Level Synthesis Scheduling Problem, IEEE Transactions on CAD of ICS, 13(4)</del>:452–459 (1994).
Google Scholar
M. Langevin and E. Cherny, A Recursive Technique for Computing Lower-Bound Performance of Schedules, ACM Transactions on Design Automaton of Electronic Systems, 1(4):443–455 (1996).
Google Scholar
A. Eichenberger and W. M. Meleis, Balance Scheduling: Weighting Branch Tradeoffs in Superblocks, in International Symposium on Microachitecture Haifa, Israel (1999).
S. G. Abraham, Efficient Backtracking Instruction Schedulers, Technical Report HPL-2000-56, Hewlett-Packard Laboratories (2000). www.hpl.hp.com/techreports/2000/HPL-2000-56.html.
B. R. Rau, Iterative Modulo Scheduling, Int. J. Parallel Prog., 24(1):3–64 (1996).
Google Scholar
V. Kathail, M. S. Schlansker, and B. R. Rau, HPL PlayDoh Architecture Specification: Version 1.0, Technical Report, Hewlett-Packard Laboratories (1991).
The Trimaran Compilation Infrastructure (1999). www.trimaran.org.
W. W. Hwu, S. A. Mahlke, W. Y. Chen, P. P. Chang, N. J. Warter, R. A. Bringmann, R. G. Ouellette, R. E. Hank, T. Kiyohara, G. E. Haab, J. G. Holm, and D. M. Lavery, The Superblock: An Effective Technique for VLIW and Superscalar Compilation, J. Supercomput., 7(1/2):229–248 (1993).
Google Scholar
J. A. Fisher, Global Code Generation for Instruction-Level Parallelism: Trace Scheduling-2, Technical Report, Hewlett-Packard Laboratories (1993).
S. Davidson, D. Landskov, B. D. Shriver, and P. W. Mallet, Some Experiments in Local Microcode Compaction for Horizontal Machines, IEEE Transactions on Computers, C-30(7): 460–477 (1981).
Google Scholar
B. R. Rau and J. A. Fisher, Instruction_Level Parallel Processing: History, Overview and Perspective, J. Supercomput., 7(1/2):9–50 (1993).
Google Scholar
J. L. Hennessy and T. Gross, Postpass Code Optimization of Pipeline Constraints, ACM Transactions on Programming Languages and Systems, 5(3):422–448 (1983).
Google Scholar
V. Bala and N. Rubin, Efficient Instruction Scheduling Using Finite State Automata, Int. J. Parallel Prog., 25(2):53–82 (1997).
Google Scholar
J. Fisher, Trace Scheduling: A Technique for Global Microcode Compaction, IEEE Transactions on Computers, C-30(7):478–490 (1981).
Google Scholar
P. G. Lowney, S. M. Freudenberger, T. J. Karzes, W. D. Lichtenstein, R. P. Nix, J. S. O'Donnell, and J. C. Ruttenberg, The Multiflow Trace Scheduling Compiler, J. Supercomput., 7(1/2):51–142 (1993).
Google Scholar
J. Ferrante, K. J. Ottenstein, and J. D. Warren, The Program Dependence Graph and Its Use in Optimization, ACM Transactions on Programming Languages and Systems, 9(3):319–349 (1987).
Google Scholar
D. Bernstein and M. Rodeh, Global Instruction Scheduling for Superscalar Processor, in ACM SIGPLAN Symposium on Programming Languages Design and Implementation, Toronto, Canada (1991).
S. M. Moon and K. Ebcioglu, An Efficient Resource_Constrained Global Scheduling Method for Superscalar and VLIW Processors, in International Symposium on Microachitecture, Portland, Oregon (1992).
P. P. Chang, S. A. Mahlke, and W. W. Hwu, Using Profile Information to Assist Classic Code Optimization, Software Practice and Experience, 21(12):1301–1321 (1991).
Google Scholar
S. A. Mahlke, D. C. Lin, W. Y. Chen, R. E. Hank, and R. A. Bringmann, Effective Compiler Support for Predicated Execution Using the Hyperblock, in International Symposium on Microachitecture, Portland, Oregon (1992).
J. Bharadwaj, K. Menezes, and C. McKinsey, Wavefront Scheduling: Path Based Data Representation and Scheduling of Subgraphs, in International Symposium on Microarchitecture, Haifa, Israel (1999).
S. G. Abraham, V. Kathail, and B. L. Dietrich, Meld Scheduling: A Technique for Relaxing Scheduling Constraints, Int. J. Parallel Prog., 26(4):349–381 (1998).
Google Scholar
Y. Wang, N. Amato, and D. Friesen, Hindsight Helps: Deterministic Task Scheduling with Backtracking, in International Conference on Parallel Processing, Bloomington, Illinois (1997).
J. Hoogerbrugge and L. Augusteijn, Instruction Scheduling for TriMedia, J. Instruction-Level Parallelism (1) (1999).
A. E. Charlesworth, An Approach to Scientific Array Processing: The Architectural Design of the AP-120B/FPS-164 Family, Computer, 14(9):18–27 (1981).
Google Scholar
B. R. Rau, D. W. L. Yen, W. Yen, and R. A. Towle, The Cydra 5 Departmental Supercomputer: Design Philosophies, Decisions and Trade_Offs, Computer, 22(1):12–34 (1989).
Google Scholar

Download references

Author information

Authors and Affiliations

Hewlett-Packard, 11000 Wolfe Rd, Cupertino, California, 95014
Ivan D. Baev
Department of Electrical and Computer Engineering, Northeastern University, Boston, Massachusetts, 02115
Waleed M. Meleis
Sun Microsystems, Mail Stop SUN03-204, [430 Mary Ave, Sunnyvale, California, 94086
Santosh G. Abraham

Authors

Ivan D. Baev
View author publications
You can also search for this author in PubMed Google Scholar
Waleed M. Meleis
View author publications
You can also search for this author in PubMed Google Scholar
Santosh G. Abraham
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Baev, I.D., Meleis, W.M. & Abraham, S.G. Backtracking-Based Instruction Scheduling to Fill Branch Delay Slots. International Journal of Parallel Programming 30, 397–418 (2002). https://doi.org/10.1023/A:1020601110391

Download citation

Issue Date: December 2002
DOI: https://doi.org/10.1023/A:1020601110391

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Backtracking-Based Instruction Scheduling to Fill Branch Delay Slots

Abstract

Access this article

Similar content being viewed by others

A Guided Search for Races Based on Data Flow Patterns

An optimizing pipeline stall reduction algorithm for power and performance on multi-core CPUs

BATAGE-BFNP: A High-Performance Hybrid Branch Predictor with Data-Dependent Branches Speculative Pre-execution for RISC-V Processors

REFERENCES

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Backtracking-Based Instruction Scheduling to Fill Branch Delay Slots

Abstract

Access this article

Similar content being viewed by others

A Guided Search for Races Based on Data Flow Patterns

An optimizing pipeline stall reduction algorithm for power and performance on multi-core CPUs

BATAGE-BFNP: A High-Performance Hybrid Branch Predictor with Data-Dependent Branches Speculative Pre-execution for RISC-V Processors

REFERENCES

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation