ABSTRACT
In a single second a modern processor can execute billions of instructions. Obtaining a bird's eye view of the behavior of a program at these speeds can be a difficult task when all that is available is cycle by cycle examination. In many programs, behavior is anything but steady state, and understanding the patterns of behavior, at run-time, can unlock a multitude of optimization opportunities.In this paper, we present a unified profiling architecture that can efficiently capture, classify, and predict phase-based program behavior on the largest of time scales. By examining the proportion of instructions that were executed from different sections of code, we can find generic phases that correspond to changes in behavior across many metrics. By classifying phases generically, we avoid the need to identify phases for each optimization, and enable a unified prediction scheme that can forecast future behavior. Our analysis shows that our design can capture phases that account for over 80% of execution using less that 500 bytes of on-chip memory.
- J. L. Aragon, J. Gonzalez, and A. Gonzalez. Power-aware control speculation through selective throttling. In Proceedings of the Ninth International Symposium on High-Performance Computer Architecture, February 2003. Google ScholarDigital Library
- R. Balasubramonian, D. H. Albonesi, A. Buyuktosunoglu, and S. Dwarkadas. Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures. In 33rd International Symposium on Microarchitecture, pages 245--257, 2000. Google ScholarDigital Library
- R. D. Barnes, E. M. Nystrom, M. C. Merten, and W. W. Hwu. Vacuum packing: Extracting hardware-detected program phases for post-link optimization. In 35th International Symposium on Microarchitecture, December 2002. Google ScholarDigital Library
- D. Brooks, V. Tiwari, and M. Martonosi. Wattch: a framework for architectural-level power analysis and optimizations. In 27th Annual International Symposium on Computer Architecture, pages 83--94, June 2000. Google ScholarDigital Library
- D. C. Burger and T. M. Austin. The simplescalar tool set, version 2.0. Technical Report CS-TR-97-1342, U. of Wisconsin, Madison, June 1997.Google ScholarDigital Library
- B. Calder, P. Feller, and A. Eustace. Value profiling and optimization. Journal of Instruction Level Parallelism, March 1999.Google Scholar
- B. Calder, G. Reinman, and D. M. Tullsen. Selective value prediction. In 26th Annual International Symposium on Computer Architecture, pages 64--74, June 1999. Google ScholarDigital Library
- I.-C. Chen, J. T. Coffey, and T. N. Mudge. Analysis of branch prediction via data compression. In Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, pages 128--137, October 1996. Google ScholarDigital Library
- A. Dhodapkar and J. E. Smith. Dynamic microarchitecture adaptation via co-designed virtual machines. In International Solid State Circuits Conference, February 2002.Google Scholar
- A. Dhodapkar and J. E. Smith. Managing multi-configuration hardware via dynamic working set analysis. In 29th Annual International Symposium on Computer Architecture, May 2002. Google ScholarDigital Library
- M. Huang, J. Renau, and J. Torrellas. Profile-based energy reduction in high-performance processors. In 4th Workshop on Feedback-Directed and Dynamic Optimization (FDDO-4), December 2001.Google Scholar
- A. Iyer and D. Marculescu. Power aware microarchitecture resource scaling. In Proceedings of the DATE 2001 on Design, automation and test in Europe, pages 190--196, 2001. Google ScholarDigital Library
- D. Joseph and D. Grunwald. Prefetching using markov predictors. In 24th Annual International Symposium on Computer Architecture, June 1997. Google ScholarDigital Library
- M. H. Lipasti, C. B. Wilkerson, and J. P. Shen. Value locality and load value prediction. In Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, pages 138--147, October 1996. Google ScholarDigital Library
- M. Merten, A. Trick, R. Barnes, E. Nystrom, C. George, J. Gyllenhaal, and Wen mei W. Hwu. An architectural framework for run-time optimization. IEEE Transactions on Computers, 50(6):567--589, June 2001. Google ScholarDigital Library
- M. Mock, C. Chambers, and S. J. Eggers. Calpa: a tool for automating selective dynamic compilation. In 33rd International Symposium on Microarchitecture, pages 291--302, December 2000. Google ScholarDigital Library
- R. Muth, S. A. Watterson, and S. K. Debray. Code specialization based on value profiles. In Static Analysis Symposium, pages 340--359, 2000. Google ScholarDigital Library
- P. Ranganathan, S. V. Adve, and N. P. Jouppi. Reconfigurable caches and their application to media processing. In 27th Annual International Symposium on Computer Architecture, pages 214--224, June 2000. Google ScholarDigital Library
- T. Sherwood and B. Calder. Time varying behavior of programs. Technical Report UCSD-CS99-630, UC San Diego, August 1999.Google Scholar
- T. Sherwood, E. Perelman, and B. Calder. Basic block distribution analysis to find periodic behavior and simulation points in applications. In International Conference on Parallel Architectures and Compilation Techniques, September 2001. Google ScholarDigital Library
- T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems, October 2002. Google ScholarDigital Library
- J. Yang and R. Gupta. Frequent value locality and its applications. Special Issue on Memory Systems, ACM Transactions on Embedded Computing Systems, 1(1):79--105, November 2002. Google ScholarDigital Library
- Phase tracking and prediction
Recommendations
Phase tracking and prediction
ISCA 2003In a single second a modern processor can execute billions of instructions. Obtaining a bird's eye view of the behavior of a program at these speeds can be a difficult task when all that is available is cycle by cycle examination. In many programs, ...
Run-time phase prediction for a reconfigurable VLIW processor
DATE '16: Proceedings of the 2016 Conference on Design, Automation & Test in EuropeIt is well-known that different applications exhibit varying amounts of ILP. Execution of these applications on the same fixed-width VLIW processor will result (1) in wasted energy due to underutilized resources if the issue-width of the processor is ...
A latency-conscious SMT branch prediction architecture
Executing multiple threads has proved to be an effective solution to partially hide latencies that appear in a processor. When a thread is stalled because of a long-latency operation is being processed, such as a memory access or a floating-point ...
Comments