Abstract
We observe a non-negligible fraction--3 to 16% in our benchmarks--of dynamically dead instructions, dynamic instruction instances that generate unused results. The majority of these instructions arise from static instructions that also produce useful results. We find that compiler optimization (specifically instruction scheduling) creates a significant portion of these partially dead static instructions. We show that most of the dynamically instructions arise from a small set of static instructions that produce dead values most of the time.We leverage this locality by proposing a dead instruction predictor and presenting a scheme to avoid the execution of predicted-dead instructions. Our predictor achieves an accuracy of 93% while identifying over 91% of the dead instructions using less than 5 KB of state. We achieve such high accuracies by leveraging future control flow information (i.e., branch predictions) to distinguish between useless and useful instances of the same static instruction.We then present a mechanism to avoid the register allocation, instruction scheduling, and execution of predicted dead instructions. We measure reductions in resource utilization averaging over 5% and sometimes exceeding 10%, covering physical register management (allocation and freeing), register file read and write traffic, and data cache accesses. Performance improves by an average of 3.6% on an architecture exhibiting resource contention. Additionally, our scheme frees future compilers from the need to consider the costs of dead instructions, enabling more aggressive code motion and optimization. Simultaneously, it mitigates the need for good path profiling information in making inter-block code motion decisions.
- D. Burger and T. Austin. The SimpleScalar tool set, version 2.0. Technical Report CS-TR-97-1342, University of Wisconsin-Madison, June 1997.Google ScholarDigital Library
- P. Chang, N. Warter, S. Mahlke, W. Chen, and W. Hwu. Three architectural models for compiler-controlled speculative execution. IEEE Trans. on Computers, 44(3), March 1995. pp. 481-94. Google ScholarDigital Library
- W. Chen, S. Mahlke, N. Warter, S. Anik, and W. Hwu. Profile-assisted instruction scheduling. Intl. Journal for Parallel Programming, 22(2), April 1994. pp. 151-81. Google ScholarDigital Library
- K. Cooper, M. Hall, and K. Kennedy. Procedure cloning. In Proceedings of the IEEE 1992 Intl. Conference on Computer Languages, April 1992. pp. 96-105.Google ScholarCross Ref
- R. Gupta, D. Berson, and J. Fang. Resource-sensitive profile-directed data flow analysis for code optimization. In Proceedings of the 30th Annual Intl. Symp. on Microarchitecture, December 1997. pp. 358-68. Google ScholarDigital Library
- E. Jacobsen, E. Rotenberg, and J. E. Smith. Assigning confidence to conditional branch predictions. In Proceedings of the 29th Intl. Symp. on Microarchitecture, December 1996. pp. 142-52. Google ScholarDigital Library
- A. Klaiber. The technology behind Crusoe™ processors. Transmeta Corporation White Paper, January 2000.Google Scholar
- J. Knoop, O. Ruthing, and B. Steffen. Partial dead code elimination. In Proceedings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation, June 1994. pp. 147-58. Google ScholarDigital Library
- K. Lepak and M. Lipasti. On the value locality of store instructions. In Proceedings of the 27th Annual Intl. Symp. on Computer Architecture, June 2000. pp. 182-91. Google ScholarDigital Library
- M. Martin, A. Roth, and C. Fischer. Exploiting dead value information. In Proceedings of the 30th Annual Intl. Symp. on Microarchitecture, December 1997. pp. 125-35. Google ScholarDigital Library
- E. Rotenberg. Exploiting large ineffectual instruction sequences. Technical Report, North Carolina State University, November 1999.Google Scholar
- K. C. Yeager. The MIPS R10000 superscalar microprocessor. IEEE Micro, 16(2), April 1996. pp. 28-41. Google ScholarDigital Library
- A. Yoaz, R. Ronen, R. Chappell, and Y. Almog. Silence is golden? Presented at the 7th Annual Symp. on High Performance Computer Architecture, January 2001.Google Scholar
Index Terms
- Dynamic dead-instruction detection and elimination
Recommendations
Dynamic dead-instruction detection and elimination
Special Issue: Proceedings of the 10th annual conference on Architectural Support for Programming Languages and Operating SystemsWe observe a non-negligible fraction--3 to 16% in our benchmarks--of dynamically dead instructions, dynamic instruction instances that generate unused results. The majority of these instructions arise from static instructions that also produce useful ...
Dynamic dead-instruction detection and elimination
ASPLOS X: Proceedings of the 10th international conference on Architectural support for programming languages and operating systemsWe observe a non-negligible fraction--3 to 16% in our benchmarks--of dynamically dead instructions, dynamic instruction instances that generate unused results. The majority of these instructions arise from static instructions that also produce useful ...
Dynamic dead-instruction detection and elimination
We observe a non-negligible fraction--3 to 16% in our benchmarks--of dynamically dead instructions, dynamic instruction instances that generate unused results. The majority of these instructions arise from static instructions that also produce useful ...
Comments