ABSTRACT
Hardware data prefetch engines are integral parts of many general purpose server-class microprocessors in the field today. Some prefetch engines allow the user to change some of their parameters. The prefetcher, however, is usually enabled in a default configuration during system bring-up and dynamic reconfiguration of the prefetch engine is not an autonomic feature of current machines. Conceptually, however, it is easy to infer that commonly used prefetch algorithms, when applied in a fixed mode will not help performance in many cases. In fact, they may actually degrade performance due to useless bus bandwidth consumption and cache pollution. In this paper, we present an adaptive prefetch scheme that dynamically modifies the prefetch settings in order to adapt to the workload requirements. We implement and evaluate adaptive prefetching in the context of an existing, commercial processor, namely the IBM POWER7. Our adaptive prefetch mechanism improves performance with respect to the default prefetch setting up to 2.7X and 30% for single-threaded and multiprogrammed workloads, respectively.
- Performance Counters for Linux. https://perf.wiki.kernel.org.Google Scholar
- Power ISATM Version 2.06 Revision B. https://www.power.org/resources/downloads/PowerISA_V2.06B_V2_PUBLIC.pdf.Google Scholar
- J. Abeles et al. Performance Guide for HPC Applications on IBM POWER 755 System. https://www.power.org/events/Power7/Performance_Guide_for_HPC_Applications_on_Power_755-Rel_1.0.1.pdf.Google Scholar
- B. Abraham and J. Ledolter. Statistical Methods for Forecasting. Wiley series in probability and mathematical statistics: Applied probability and statistics. Wiley, 1983.Google Scholar
- J. L. Baer and T. F. Chen. An Effective On-Chip Preloading Scheme To Reduce Data Access Penalty. In Proc. ACM/IEEE Conf. Supercomputing, SC, pages 176--186, 1991. Google ScholarDigital Library
- C. Boneti, F. J. Cazorla, R. Gioiosa, A. Buyuktosunoglu, C. Y. Cher, and M. Valero. Software-Controlled Priority Characterization of POWER5 Processor. In Proc. 35th Int'l Symp. Comp. Arch., ISCA, pages 415--426, 2008. Google ScholarDigital Library
- H. W. Cain and P. Nagpurkar. Runahead Execution vs. Conventional Data Prefetching in the IBM POWER6 Microprocessor. In Proc. Int'l Symp. Perf. Analysis of Systems Software, ISPASS, pages 203--212, 2010.Google Scholar
- F. J. Cazorla et al. Predictable Performance in SMT Processors: Synergy between the OS and SMTs. IEEE Trans. Comput., 55(7):785--799, July 2006. Google ScholarDigital Library
- S. Choi and D. Yeung. Learning-Based SMT Processor Resource Distribution via Hill-Climbing. In Proc. 33rd Int'l Symp. Comp. Arch., ISCA, pages 239--251, 2006. Google ScholarDigital Library
- P. J. Denning. The Working Set Model for Program Behavior. Commun. ACM, 11(5):323--333, May 1968. Google ScholarDigital Library
- E. Ebrahimi, C. J. Lee, O. Mutlu, and Y. N. Patt. Prefetch-Aware Shared Resource Management for Multi-Core Systems. In Proc. 38th Int'l Symp. Comp. Arch., ISCA, pages 141--152, 2011. Google ScholarDigital Library
- E. Ebrahimi, O. Mutlu, and Y. N. Patt. Techniques for Bandwidth-Efficient Prefetching of Linked Data Structures in Hybrid Prefetching Systems. In Proc. 15th Int'l Symp. High Perf. Comp. Arch., HPCA, pages 7--17, 2009.Google ScholarCross Ref
- P. G. Emma, A. Hartstein, T. R. Puzak, and V. Srinivasan. Exploring the limits of prefetching. IBM J. R&D, 49(1):127--144, January 2005. Google ScholarDigital Library
- J. L. Henning. SPEC CPU2006 Benchmark Descriptions. SIGARCH Comp. Arch. News, 34(4):1--17, September 2006. Google ScholarDigital Library
- I. Hur and C. Lin. Memory Prefetching Using Adaptive Stream Detection. In Proc. 39th Int'l Symp. on Microarchitecture, MICRO, pages 397--408, 2006. Google ScholarDigital Library
- C. Isci, A. Buyuktosunoglu, and M. Martonosi. Long-Term Workload Phases: Duration Predictions and Applications to DVFS. IEEE Micro, 25(5):39--51, September 2005. Google ScholarDigital Library
- D. Joseph and D. Grunwald. Prefetching using Markov Predictors. In Proc. 24th Int'l Symp. Comp. Arch., ISCA, pages 252--263, 1997. Google ScholarDigital Library
- N. P. Jouppi. Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers. In Proc. 17th Int'l Symp. Comp. Arch., ISCA, pages 364--373, 1990. Google ScholarDigital Library
- C. J. Lee, O. Mutlu, V. Narasiman, and Y. N. Patt. Prefetch-Aware DRAM Controllers. In Proc. 41st Int'l Symp. Microarch., MICRO, pages 200--209, 2008. Google ScholarDigital Library
- S. W. Liao et al. Machine Learning-Based Prefetch Optimization for Data Center Applications. In Proc. Int'l Conf. High Perf. Comp. Networking, Storage and Analysis, SC, pages 1--10, 2009. Google ScholarDigital Library
- F. Liu and Y. Solihin. Studying the Impact of Hardware Prefetching and Bandwidth Partitioning in Chip-Multiprocessors. In Proc. Int'l Conf. Measur. and Model. of Comp. Sys., SIGMETRICS, pages 37--48, 2011. Google ScholarDigital Library
- P. Mochel. The sysfs Filesystem. Proc. Annual Linux Symp., 2005.Google Scholar
- M. Moreto, F. J. Cazorla, A. Ramirez, R. Sakellariou, and M. Valero. FlexDCP: a QoS Framework for CMP Architectures. SIGOPS Oper. Syst. Rev., 43(2):86--96, April 2009. Google ScholarDigital Library
- S. Palacharla and R. E. Kessler. Evaluating Stream Buffers as a Secondary Cache Replacement. In Proc. 21st Int'l Symp. Comp. Arch., ISCA, pages 24--33, 1994. Google ScholarDigital Library
- M. K. Qureshi and Y. N. Patt. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches. In Proc. 39th Int'l Symp. Microarch., MICRO, pages 423--432, 2006. Google ScholarDigital Library
- A. Roth, A. Moshovos, and G. S. Sohi. Dependence Based Prefetching for Linked Data Structures. In Proc. 8th Int'l Conf. Arch. Support for Prog. Lang. and Operat. Sys., ASPLOS, pages 115--126, 1998. Google ScholarDigital Library
- B. Sinharoy et al. IBM POWER7 multicore server processor. IBM J. R&D, 55(3):1--29, May-June 2011. Google ScholarDigital Library
- Y. Solihin, J. Lee, and J. Torrellas. Using a User-Level Memory Thread for Correlation Prefetching. In Proc. 29th Int'l Symp. Comp. Arch., ISCA, pages 171--182, 2002. Google ScholarDigital Library
- V. Srinivasan et al. A prefetch taxonomy. IEEE Trans. Comp., 53(2):126--140, February 2004. Google ScholarDigital Library
- C. J. Wu and M. Martonosi. Characterization and Dynamic Mitigation of Intra-Application Cache Interference. In Proc. Int'l Symp. Perf. Analysis of Systems and Software, ISPASS, pages 2--11, 2011. Google ScholarDigital Library
- W. A. Wulf and S. A. McKee. Hitting the Memory Wall: Implications of the Obvious. SIGARCH Comp. Arch. News, 23:20--24, March 1995. Google ScholarDigital Library
- C. L. Yang and A. R. Lebeck. Push vs. Pull: Data Movement for Linked Data Structures. In Proc. 14th Int'l Conf. Supercomputing, ICS, pages 176--186, 2000. Google ScholarDigital Library
Index Terms
- Making data prefetch smarter: adaptive prefetching on POWER7
Recommendations
Data prefetch mechanisms
The expanding gap between microprocessor and DRAM performance has necessitated the use of increasingly aggressive techniques designed to reduce or hide the latency of main memory access. Although large cache hierarchies have proven to be effective in ...
Adaptive Prefetching on POWER7: Improving Performance and Power Consumption
Inaugural Issue and Special Section on Top Papers from PACT-21, and Regular PapersHardware data prefetch engines are integral parts of many general purpose server-class microprocessors in the field today. Some prefetch engines allow users to change some of their parameters. But, the prefetcher is usually enabled in a default ...
Prefetch-Aware Memory Controllers
Existing DRAM controllers employ rigid, nonadaptive scheduling and buffer management policies when servicing prefetch requests. Some controllers treat prefetches the same as demand requests, and others always prioritize demands over prefetches. However, ...
Comments