Abstract
Speculative execution is execution of instructions before it is known whether these instructions should be executed. Compiler-based speculative execution has the potential to achieve both a high instruction per cycle rate and high clock rate. Pure compiler-based approaches, however, have greatly limited instruction scheduling due to a limited ability to handle side effects of speculative execution. Significant performance improvement is, thus, difficult in non-numerical applications. This paper proposes a new architectural mechanism, called predicating, which provides unconstrained speculative execution. Predicating removes restrictions which limit the compiler's ability to schedule instructions. Through our hardware support, the compiler is allowed to move instructions past multiple basic block boundaries from any succeeding control path. Predicating buffers the side effects of speculative execution with its predicate, and the buffered predicate efficiently commits or squashes the side effects. The mechanism also provides a speculative exception handling scheme. The scheme, called the future condition, properly postpones speculative exceptions and efficiently restarts the process. We show that our mechanism can be implemented through a modest amount of hardware with little complexity. The evaluation results show that our mechanism significantly improves performance, and achieves a 2.45x speedup over scalar machines.
- 1 A.V. Aho, R. Sethi, and J. D. Ullman, Compilers: Principles. Techniques, and Tools, Addison-Weslay Publishing Company, Reading, Massachusetts, 1986. Google ScholarDigital Library
- 2 H. Ando, C. Nakanishi, H. Machida, T. Hara, S. Kishida, and M. Nakaya, "Speculative Execution and Reducing Branch Penalty in a Parallel Issue Machine," In Proc. Int. Conf. on Computer Design, pp. 106-113, October 1993.Google Scholar
- 3 R.A. Bringmann, S. A. Mahlke, R. E. Hank, j. G. Gyllenhail, and W. W. Hwu, "Speculative Execution Exception Recovery using Write-back Suppression," In Proc. MICRO- 26, pp.214-223, December 1993. Google ScholarDigital Library
- 4 P.P. Chang, S. A, Mahlke, W. Y. Chen, N. J. Warter, and W. W. Hwu, "IMPACT: An Architectural Framework for Multiple-Instruction-Issue Processors," In Proc. 18th Int. Symp. on Computer Architecture, pp.266-275, May 1991. Google ScholarDigital Library
- 5 R.P. Colwel, R. P. Nix, J. J. O'Donnell, D. B. Papworth, and P. K. Rodman, "A VLIW Architecture for a Trace Scheduling Compiler," In Proc. Second Int. Conf. on Architectural Support for Programming Languages and Operating Systems, pp. 180-192, April 1987. Google ScholarCross Ref
- 6 K. Ebcioglu and A. Nicolau, "A Global Resource- Constrained Parallelization Technique," In Proc. Third Int. Conf. on Supercomputing, pp.154-163, June 1989. Google ScholarDigital Library
- 7 J.A. Fisher, "Trace Scheduling: A Technique for Global Microcode Compaction," IEEE Trans. on Computers, C-30(7):478-490, July 1981.Google ScholarDigital Library
- 8 P.Y.T. Hsu, and E. S. Davidson, "Highly Concurrent Scalar Processing," In Proco 13th hrt. Symp. on Computer Architecture, pp.386-395, June 1986. Google ScholarDigital Library
- 9 G. Kane, MIPS RISC Architecture, Prentice Hail, Englewood Cliffs, New Jersey, 1988. Google ScholarDigital Library
- 10 M. S. Lain and R. P. Wilson, "Limits of Control Flow on Parallelism," In Proc. 19th Int. Symp. on Computer Architecture, pp.46-57, June 1992. Google ScholarDigital Library
- 11 J.K.F. Lee, A. J. Smith, "Branch Prediction Strategies and Branch Target Buffer Design," Computer 17 (1), pp.6-22, January 1984.Google ScholarDigital Library
- 12 S.A. Mahlke, W. Y. Chen, W. W. Hwu, B. R. Rau, and M. S. Schlansker, "Sentinel Scheduhng for VLIW and Superscalar Processors," In Proc. Second Int. Conf. on Architectural Support for Programming Language.s and Operating Systems, pp.238-247, October 1992. Google ScholarDigital Library
- 13 S.A. Mahlke, D. C. Lin, W. Y. Chen, R. E. Hank, and R. A. Bringmann, "Effective Compiler Support for Predicated Execution Using the Hyperblock," in Proc. MICRO-25, pp.45- 54, December 1992. Google ScholarDigital Library
- 14 K, Murakami, N. Irie, M. Kuga, and S. Tomita, "SIMP (Single Instruction Stream/Multiple Instruction Pipelining): A Novel High-Speed Single-Processor Architecture," in Proc. 16th Int. Symp. on Computer Architecture. pp.78-85, June 1989. Google ScholarDigital Library
- 15 A. Nlcolau, "Percolation Scheduling: A Parallel Compilation Technique," Computer Sciences Technical Report 85-678, Cornel University, May 1985. Google ScholarDigital Library
- 16 J.E. Smith and A. R. Pleszkun, "Implementation of Precise Interrupts in Pipelined Processors," In Proc. 12th Int. Symp. on Computer Architecture, pp.36-44, June 1985. Google ScholarDigital Library
- 17 M. D. Smith, M. S. Lain, and M. A. Horowitz, "Boosting Beyond Static Scheduling in a Superscala~r Processor," In Proc. 17th Int. Symp. on Computer Architecture~ pp.344- 355, May 1990. Google ScholarDigital Library
- 18 M.D. Smith, M. A. Horowitz, and M. S. Lain, "Efficient Superscalar Performance Through Boosting," In Proc. Fifth Int. Conf. on Architectural Support for Programming Lan. guages and Operating Systems, pp.248-259, October 1992. Google ScholarDigital Library
- 19 R. M. Tomasulo, "An efficient Algorithm for Exploiting Multiple Arithmetic Units," IBM Journal, 11 (i):25-33, January 1967.Google ScholarDigital Library
- 20 D.W. Wall, "Limits of Instruction-Level Parallelism," In Proc. Fourth h~t. Conf. on Architectural Support for Programming Languages and Operating Systems, pp.272-282, April 1991. Google ScholarDigital Library
Index Terms
- Unconstrained speculative execution with predicated state buffering
Recommendations
Unconstrained speculative execution with predicated state buffering
ISCA '95: Proceedings of the 22nd annual international symposium on Computer architectureSpeculative execution is execution of instructions before it is known whether these instructions should be executed. Compiler-based speculative execution has the potential to achieve both a high instruction per cycle rate and high clock rate. Pure ...
Using Predicated Execution to Improve the Performance of a Dynamically Scheduled Machine with Speculative Execution
Conditional branches incur a severe performance penalty in wide-issue, deeply pipelined processors. Speculative execution(1, 2) and predicated execution(3---9) are two mechanisms that have been proposed for reducing this penalty. Speculative execution ...
An evaluation of speculative instruction execution on simultaneous multithreaded processors
Modern superscalar processors rely heavily on speculative execution for performance. For example, our measurements show that on a 6-issue superscalar, 93% of committed instructions for SPECINT95 are speculative. Without speculation, processor resources ...
Comments