Skip to main content
Log in

The superblock: An effective technique for VLIW and superscalar compilation

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

A compiler for VLIW and superscalar processors must expose sufficient instruction-level parallelism (ILP) to effectively utilize the parallel hardware. However, ILP within basic blocks is extremely limited for control-intensive programs. We have developed a set of techniques for exploiting ILP across basic block boundaries. These techniques are based on a novel structure called thesuperblock. The superblock enables the optimizer and scheduler to extract more ILP along the important execution paths by systematically removing constraints due to the unimportant paths. Superblock optimization and scheduling have been implemented in the IMPACT-I compiler. This implementation gives us a unique opportunity to fully understand the issues involved in incorporating these techniques into a real compiler. Superblock optimizations and scheduling are shown to be useful while taking into account a variety of architectural features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aho, A., Sethi, R., and Ullman, J. 1986.Compilers: Principles, Techniques, and Tools. Addison-Wesley, Reading, Mass.

    Google Scholar 

  • Aiken, A., and Nicolau, A. 1988. A development environment for horizontal microcode.IEEE Trans. Software Engineering, 14 (May): 584–594.

    Google Scholar 

  • Bernstein, D., and Rodeh, M. 1991. Global instruction scheduling for superscalar machines. InProc., ACM SIGPLAN 1991 Conf. on Programming Language Design and Implementation (June), pp. 241–255.

    Google Scholar 

  • Chaitin, G.J. 1982. Register allocation and spilling via graph coloring. InProc., ACM SIGPLAN 82 Symp. on Compiler Construction (June), pp. 98–105.

    Google Scholar 

  • Chang, P.P., and Hwu, W.W. 1988. Trace selection for compiling large C application programs to microcode. InProc., 21st Internat. Workshop on Microprogramming and Microarchitecture (Nov.), pp. 188–198.

    Google Scholar 

  • Chang, P.P., Mahlke, S.A., and Hwu, W.W. 1991. Using profile information to assist classic code optimizations.Software Practice and Experience, 21, 12 (Dec): 1301–1321.

    Google Scholar 

  • Chang, P.P., Mahlke, S.A., Chen, W.Y., Waiter, N.J., and Hwu, W.W. 1991. IMPACT: An architectural framework for multiple-instruction-issue processors. InProc., 18th Internat. Symp. on Comp. Architecture (May), pp. 266–275.

    Google Scholar 

  • Chen, W.Y., Chang, P.P., Conte, T.M., and Hwu, W.W. 1991. The effect of code expanding optimizations on instruction cache design. Tech. Rept. CRHC-91-17, Center for Reliable and High-Performance Computing, Univ. of Ill., Urbana, Ill.

    Google Scholar 

  • Chow, F.C., and Hennessy, J.L. 1990. The priority-based coloring approach to register allocation.ACM Trans. Programming Languages and Systems, 12 (Oct.): 501–536.

    Google Scholar 

  • Colwell, R.P., Nix, R.P., O'Donnell, J.J., Papworth, D.B., and Rodman, P.K. 1987. A VLIW architecture for a trace scheduling compiler. InProc., 2nd Internat. Conf. on Architectural Support for Programming Languages and Operating Systems (Apr.), pp. 180–192.

    Google Scholar 

  • Ellis, J. 1986.Bulldog: A Compiler for VLIW Architectures. MIT Press, Cambridge, Mass.

    Google Scholar 

  • Fisher, J.A. 1981. Trace scheduling: A technique for global microcode compaction.IEEE Trans. Comps., C-30, 7 (July): 478–490.

    Google Scholar 

  • Gupta, R., and Soffa, M.L. 1990. Region scheduling: An approach for detecting and redistributing parallelism.IEEE Trans. Software Engineering, 16 (Apr.): 421–431.

    Google Scholar 

  • Horst, R.W., Harris, R.L., and Jardine, R.L. 1990. Multiple instruction issue in the NonStop Cyclone processor. InProc., 17th Internat. Symp. on Computer Architecture (May), pp. 216–226.

    Google Scholar 

  • Hwu, W.W., and Chang, P.P. 1989a. Achieving high instruction cache performance with an optimizing compiler. InProc., 16th Internat. Symp. on Comp. Architecture (May), pp. 242–251.

    Google Scholar 

  • Hwu, W.W., and Chang, P.P. 1989b. Inline function expansion for compiling realistic C programs. InProc., ACM SIGPLAN 1989 Conf. on Programming Language Design and Implementation (June), pp. 246–257.

    Google Scholar 

  • Hwu, W.W., and Chang, P.P. 1992. Efficient instruction sequencing with inline target insertion.IEEE Trans. Comps., 41, 12 (Dec.):1537–1551.

    Google Scholar 

  • Intel. 1989.i860 64-Bit Microprocessor Programmer's Reference Manual. Intel Corp., Santa Clara, Calif.

    Google Scholar 

  • Jouppi, N.P., and Wall, D.W. 1989. Available instruction-level parallelism for superscalar and superpipelined machines. InProc., 3rd Internat. Conf. on Architectural Support for Programming Languages and Operating Systems (Apr.), pp. 272–282.

    Google Scholar 

  • Kane, G. 1987.MIPS R2000 RISC Architecture. Prentice-Hall, Englewood Cliffs, N.J.

    Google Scholar 

  • Kuck, D.J. 1978.The Structure of Computers and Computations. John Wiley, New York.

    Google Scholar 

  • Kuck, D.J., Kuhn, R.H., Padua, D.A., Leasure, B., and Wolfe, M. 1981. Dependence graphs and compiler optimizations. InProc., 8th ACM Symp. on Principles of Programming Languages (Jan.), pp. 207–218.

    Google Scholar 

  • Mahlke, S.A., Chen, W.Y., Hwu, W.W., Rau, B.R., and Schlansker, M.S.S. 1992. Sentinel scheduling for VLIW and superscalar processors. InProc., 5th Internat. Conf. on Architectural Support for Programming Languages and Operating Systems (Boston, Oct.), pp. 238–247.

  • Nakatani, T., and Ebcioglu, K. 1989. Combining as a compilation technique for VLIW architectures. InProc., 22nd Internat. Workshop on Microprogramming and Microarchitecture (Sept.), pp. 43–55.

    Google Scholar 

  • Rau, B.R., Yen, D.W.L., Yen, W., and Towle, R. A. 1989. The Cydra 5 departmental supercomputer.IEEE Comp., 22, 1 (Jan.): 12–34.

    Google Scholar 

  • Schuette, M.A., and Shen, J.P. 1991. An instruction-level performance analysis of the Multiflow TRACE 14/300. InProc., 24th Internat. Workshop on Microprogramming and Microarchitecture (Nov.), pp. 2–11.

    Google Scholar 

  • Smith, M.D., Johnson, M., and Horowitz, M.A. 1989. Limits on multiple instruction issue. InProc., 3rd Internat. Conf. on Architectural Support for Programming Languages and Operating Systems (Apr.), pp. 290–302.

    Google Scholar 

  • Warren, H.S., Jr. 1990. Instruction scheduling for the IBM RISC System/6000 processor.IBM J. Res. and Dev., 34, 1 (Jan.): 85–92.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hwu, W.M.W., Mahlke, S.A., Chen, W.Y. et al. The superblock: An effective technique for VLIW and superscalar compilation. J Supercomput 7, 229–248 (1993). https://doi.org/10.1007/BF01205185

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01205185

Keywords

Navigation