ABSTRACT
High performance compilers increasingly rely on accurate modeling of the machine resources to efficiently exploit the instruction level parallelism of an application. In this paper, we propose a reduced machine description that results in faster detection of resource contentions while preserving the scheduling constraints present in the original machine description. The proposed approach reduces a machine description in an automated, error-free, and efficient fashion, Moreover, it fully supports schedulers that backtrack and process operations in arbitrary order. Reduced descriptions for the DEC Alpha 21064, MIPS R3000/R3010, and Cydra 5 result in 4 to 7 times faster detection of resource contentions and require 22 to 90% of the memory storage used by the original machine descriptions. Precise measurement for the Cydra 5 indicates that reducing the machine description results in a 2.9 times faster contention query module.
- 1.J. C. Dehnert and R. A. Towle. Compiling for the Cydra 5. In The Journal of Supercom~uting, volume 7, pages 181-227, 1993. Google ScholarDigital Library
- 2.N. J. Warter, G. E. Haab, K. Subramanian, and J. W. Bockhaus. Enhanced Modulo Scheduling for loops with conditional branches. Proc. of the 25th Annual International Symposium on Microarchitecture, pages 170-179, Dec. 1992. Google ScholarDigital Library
- 3.B. R. Rau. Iterative Modulo Scheduling: An algorithm for software pipelining loops. Proc. of the 27th Annual International Symposium on Microarchitecture, pages 63-74, Nov. 1994. Google ScholarDigital Library
- 4.R.A. Huff. Lifetime-sensitive modulo scheduling. Proc. ofthe ACM SIGPLAN'93 Conference on Programming Language Design and Implementation, pages 258-267, June 1993. Google ScholarDigital Library
- 5.J. R. Goodman and W.-C. Hsu. Code scheduling and register allocation in large basic blocs. Proceedings of the International Conference on Supercomputing, pages 442--452, 1988. Google ScholarDigital Library
- 6.K. Ebcioglu, R. D. Groves, K.-C. Kim, G. M. Silberman, ~md I. Ziv. VLIW compilation techniques in a superscalar environment. In Proc. of the ACM SIGPLAN'94 Conference on Programming Language Design and Implementation, pages 36- 48. 1994. Google ScholarDigital Library
- 7.G.P. Lowney et al. The Multiflow trace scheduling compiler. In The Journal of Supercomputing, volume 7, pages 51-142, 1993. Google ScholarDigital Library
- 8.P.P. Chang, N. J. Warter, S. A. Mahlke, W. Y. Chen, and W. W. Hwu. Three architectural models for compiler-controlled speculative execution. IEEE Transactions on Computers, 44(4):481--494, April t995. Google ScholarDigital Library
- 9.D. Bemstein and M. Rodeh. Globalinstmction scheduling :{or superscalar machines. In Proc. of the ACM SIGPLAN'91 Conference on Programming Language Design and implementation, pages 241-255, June 1991. Google ScholarDigital Library
- 10.S.-M. Moon and K. Ebcioglu. An efficient resourceconstrained global scheduling technique for superscalar and VLIW processors. Proc. of the 25th Annual international Symposium on Microarchitecture, pages 55-71, Sept. t992;. Google ScholarDigital Library
- 11.P.P. Chang, S. A. Mahlke, W. Y. Chen, N. J. Warter, and W. W. Hwu. IMPACT: An architectural framework for multipleinstruction-issue processors. In Proceedings of the Eighteenth Annual International Symposium on Computer Architecture, pages 266-275, May 1991. Google ScholarDigital Library
- 12.J.C. Gyllenhaal. A machine description tanguage for compilation. Master's thesis, Department of Electrical and Computer Engineering, University of Illinois, Urbana, IL, 1994.Google Scholar
- 13.J. A. Fisher. Trace scheduling: a technique for global microcode compaction. IEEE Transactions on Computers, 30(7):478-490, July 1981.Google ScholarDigital Library
- 14.V. Kathail, M. S. Schlansker, andB. R. Rau. HPL PlayDoh architecture specification: Version 1.0. Technical Report HPL- 93-80, HP Laboratories, Feb. 1994.Google Scholar
- 15.T. A. Proebsting and C. W. Fraser. Detecting pipeline structural hazards quickly. Twenty-First Annual ACM SIGPLAN- SIGACT Symposium on Principles of Programming Languages, pages 280-286, Jan. 1994. Google ScholarDigital Library
- 16.T. Mtiller. Employing finite automata for resource scheduling. Proc. of the 26th Annual International Symposium on Microarchitecture, pages 12-20, 1993. Google ScholarDigital Library
- 17.V. Bala and N. Rubin. Efficient instruction scheduling using finite state automata. Proc. of the 28th Annual International Symposium on Microarchitecture, pages 46-56, Nov. 1995. Google ScholarDigital Library
- 18.M. Lain. Software Pipelining: An effective scheduling technique for VLIW machines. Proc. of the ACM SIGPZAN'88 Conference on Programming Language Design and Implementation, pages 318-328, June 1988. Google ScholarDigital Library
- 19.Digital Equipment Corp., Maynard, MA. DecChip 21064 Microprocessor Hardware Reference Manual EC-NO079- 72 .Google Scholar
- 20.G. Kane and J. Heinrich. MIPS RISC Architecture. Prentice Hall, 1992. Google ScholarDigital Library
- 21.G. R. Beck, D. W. L. Yen, and T. L. Anderson. The Cydra 5 mini-supercomputer: Architecture and implementation. In The Journal of Supercomputing, volume 7, pages 143-180, 1993. Google ScholarDigital Library
- 22.E. S. Davidson, L. E. Shar, A. T Thomas, and J. H. Patel. Effective control for pipelined computers. Spring COMPCON- 75 digest ofpapers, pages 181-184, Feb. 1975.Google Scholar
- 23.V. Bala. Personal communication. Feb. 1996.Google Scholar
- 24.J. H. Patel and E. S. Davidson. Improving the throughput of a pipeline by insertion of delays. Proceedings of the Third Annual International Symposium on Computer Architecture, pages 159-164, 1976. Google ScholarDigital Library
- 25.M. Berry et al. The Perfect Club Benchmarks: Effective performance evaluation of supercomputers. The international Journal of SupercomputerApplications, 3(3):5-40, Fall 1989.Google Scholar
- 26.J. Uniejewski. SPEC Benchmark Suite: Designed for today's advanced system. SPEC Newsletter, Fall 1989.Google Scholar
- 27.F. H. McMahon. The Livermore Fortran Kernels: A computer test of the numerical performance range. Technical Report UCRL-53745, Lawrence Livermore National Laboratory, Livermore, California, 1986.Google Scholar
- 28.M. S. Schlansker. Personal communication. June 1995.Google Scholar
- 29.P.Y. Hsu. Highly Concurrent Scalar Processing. PhD thesis, University of Illinois at Urbana-Champaign, 1986. Google ScholarDigital Library
Index Terms
- A reduced multipipeline machine description that preserves scheduling constraints
Recommendations
A reduced multipipeline machine description that preserves scheduling constraints
High performance compilers increasingly rely on accurate modeling of the machine resources to efficiently exploit the instruction level parallelism of an application. In this paper, we propose a reduced machine description that results in faster ...
Machine-Description Driven Compilers for EPIC and VLIW Processors
In the past, due to the restricted gate count available on an inexpensive chip, embedded DSPs have had limited parallelism, few registers and irregular, incomplete interconnectivity. More recently, with increasing levels of integration, embedded VLIW ...
Comments