Abstract
Most previous work on cache analysis for WCET estimation assumes a particular replacement policy called LRU. In contrast, much less work has been done for non-LRU policies, since they are generally considered to be very unpredictable. However, most commercial processors are actually equipped with these non-LRU policies, since they are more efficient in terms of hardware cost, power consumption and thermal output, while still maintaining almost as good average-case performance as LRU.
In this work, we study the analysis of MRU, a non-LRU replacement policy employed in mainstream processor architectures like Intel Nehalem. Our work shows that the predictability of MRU has been significantly underestimated before, mainly because the existing cache analysis techniques and metrics do not match MRU well. As our main technical contribution, we propose a new cache hit/miss classification, k-Miss, to better capture the MRU behavior, and develop formal conditions and efficient techniques to decide k-Miss memory accesses. A remarkable feature of our analysis is that the k-Miss classifications under MRU are derived by the analysis result of the same program under LRU. Therefore, our approach inherits the advantages in efficiency and precision of the state-of-the-art LRU analysis techniques based on abstract interpretation. Experiments with instruction caches show that our proposed MRU analysis has both good precision and high efficiency, and the obtained estimated WCET is rather close to (typically 1%∼8% more than) that obtained by the state-of-the-art LRU analysis, which indicates that MRU is also a good candidate for cache replacement policies in real-time systems.
- Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. 1986. Compilers: Principles, Techniques, and Tools. Addison-Wesley Longman Publishing Co., Inc., Boston, MA. Google ScholarDigital Library
- Hussein Al-Zoubi, Aleksandar Milenkovic, and Milena Milenkovic. 2004. Performance evaluation of cache replacement policies for the spec cpu2000 benchmark suite. In Proceedings of the 42nd Annual Southeast Regional Conference. ACM-SE 42. ACM, New York, 267--272. Google ScholarDigital Library
- Frances E. Allen. 1970. Control flow analysis. In Proceedings of the Symposium on Compiler Optimization. Google ScholarDigital Library
- Sebastian Altmeyer, Claire Maiza, and Jan Reineke. 2010. Resilience analysis: tightening the crpd bound for set-associative caches. In Proceedings of the ACM SIGPLAN/SIGBED 2010 Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES'10). ACM, New York, 153--162. Google ScholarDigital Library
- R. Arnold, F. Mueller, D. B. Whalley, and M. G. Harmon. 1994. Bounding worst-case instruction cache performance. In Proceedings of RTSS.Google Scholar
- Todd Austin, Eric Larson, and Dan Ernst. 2002. Simplescalar: An infrastructure for computer system modeling. Computer 35, 2, 59--67. Google ScholarDigital Library
- Clément Ballabriga and Hugues Casse. 2008. Improving the first-miss computation in set-associative instruction caches. In Proceedings of the Euromicro Conference on Real-Time Systems (ECRTS'08). IEEE Computer Society, Los Alamitos, CA, 341--350. Google ScholarDigital Library
- M. Berkelaar. lp_solve: (Mixed Integer) linear programming problem solver. ftp://ftp.es.ele.tue.nl/pub/lp_solve.Google Scholar
- Sudipta Chattopadhyay, Abhik Roychoudhury, and Tulika Mitra. 2010. Modeling shared cache and bus in multi-cores for timing analysis. In Proceedings of the 13th International Workshop on Software Compilers for Embedded Systems (SCOPES'10). ACM, New York, 6:1--6:10. Google ScholarDigital Library
- Christoph Cullmann. 2011. Cache persistence analysis: A novel approachtheory and practice. In Proceedings of the 2011 SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems (LCTES'11). ACM, New York, 121--130. Google ScholarDigital Library
- David Eklov, Nikos Nikoleris, David Black-Schaffer, and Erik Hagersten. 2011. Cache pirating: Measuring the curse of the shared cache. In Proceedings of the International Conference on Parallel Processing (ICPP'11). IEEE Computer Society, Los Alamitos, CA, 165--175. Google ScholarDigital Library
- C. Ferdinand. 1997. Cache behavior prediction for real-time systems. Ph.D. Thesis, Universitat des Saarlandes.Google Scholar
- Christian Ferdinand and Reinhard Wilhelm. 1998. On predicting data cache behavior for real-time systems. In Proceedings of the ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems (LCTES'98), Frank Mueller and Azer Bestavros, Eds., Lecture Notes in Computer Science, vol. 1474, Springer, 16--30. Google ScholarDigital Library
- Daniel Grund and Jan Reineke. 2009. Abstract interpretation of FIFO replacement. In Proceedings of the 16th International Symposium on Static Analysis (SAS'09). Springer-Verlag, Berlin, 120--136. Google ScholarDigital Library
- Daniel Grund and Jan Reineke. 2010a. Precise and efficient FIFO-replacement analysis based on static phase detection. In Proceedings of the 2010 22nd Euromicro Conference on Real-Time Systems, (ECRTS'10). IEEE Computer Society, Los Alamitos, CA, 155--164. Google ScholarDigital Library
- Daniel Grund and Jan Reineke. 2010b. Toward precise PLRU cache analysis. In Proceedings of 10th International Workshop on Worst-Case Execution Time (WCET) Analysis. B. Lisper, Ed., 28--39.Google Scholar
- Nan Guan, Mingsong Lv, Wang Yi, and Ge Yu. 2012. WCET analysis with MRU caches: Challenging LRU for predictability. In Proceedings of the IEEE 18th Real Time and Embedded Technology and Applications Symposium (RTAS'12). IEEE Computer Society, Los Alamitos, CA, 55--64. Google ScholarDigital Library
- Jan Gustafsson, Adam Betts, Andreas Ermedahl, and Björn Lisper. 2010. The mälardalen WCET benchmarks: Past, present and future. In Proceedings of the 10th International Workshop on Worst-Case Execution Time Analysis (WCET'10). B. Lisper, Ed., OASICS Series, vol. 15, Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, Germany, 136--146.Google Scholar
- Damien Hardy and Isabelle Puaut. 2008. WCET analysis of multi-level non-inclusive set-associative instruction caches. In Proceedings of the 2008 Real-Time Systems Symposium (RTSS'08). IEEE Computer Society, Los Alamitos, CA, 456--466. Google ScholarDigital Library
- Reinhold Heckmann, Marc Langenbach, Stephan Thesing, and Reinhard Wilhelm. 2003. The influence of processor architecture on the design and the results of WCET tools. Proc. IEEE 91, 7, 1038--1054.Google ScholarCross Ref
- John L. Hennessy and David A. Patterson. 2006. Computer Architecture: A Quantitative Approach. 4th Ed. Morgan Kaufmann Publishers Inc., San Francisco, CA. Google ScholarDigital Library
- Bach Khoa Huynh, Lei Ju, and Abhik Roychoudhury. 2011. Scope-aware data cache analysis for WCET estimation. In Proceedings of the 17th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS'11). IEEE Computer Society, Los Alamitos, CA, 203--212. Google ScholarDigital Library
- Poonacha Kongetira, Kathirgamar Aingaran, and Kunle Olukotun. 2005. Niagara: A 32-way multithreaded SPARC processor. IEEE Micro 25, 2, 21--29. Google ScholarDigital Library
- Xianfeng Li, Yun Liang, Tulika Mitra, and Abhik Roychoudhury. 2007. Chronos: A timing analyzer for embedded software. Sci. Comput. Program. 69, 1--3, 56--67. Google ScholarDigital Library
- Y. S. Li, S. Malik, and A. Wolfe. 1996. Cache modeling for real-time software: Beyond direct mapped instruction caches. In Proceedings of the 1996 Real-Time Systems Symposium (NTSS'96). IEEE Computer Society, Los Alamitos, CA. Google ScholarDigital Library
- Yau-Tsun Steven Li and Sharad Malik. 1995. Performance analysis of embedded software using implicit path enumeration. In Proceedings of the 32nd Annual ACM/IEEE Design Automation Conference (DAC'95). ACM, New York, 456--461. Google ScholarDigital Library
- Yun Liang, Huping Ding, Tulika Mitra, Abhik Roychoudhury, Yan Li, and Vivy Suhendra. 2012. Timing analysis of concurrent programs running on shared cache multi-cores. Real-Time Syst. 48, 6, 638--680. Google ScholarDigital Library
- M. Lv. 2012. CATE: A simulator for Cache Analysis Technique Evaluation in WCET estimation. http://faculty.neu.edu.cn/ise/lvmingsong/cate/.Google Scholar
- A. Malamy, R. Patel, and N. Hayes. 1994. Methods and apparatus for implementing a pseudo-LRU cache memory replacement scheme with a locking feature. United States Patent 5029072.Google Scholar
- F. Mueller. 1994. Static cache simulation and its applications. Ph.D. thesis, Florida State University. Google ScholarDigital Library
- Frank Mueller. 2000. Timing analysis for instruction caches. Real-Time Syst. 18, 2/3, 217--247. Google ScholarDigital Library
- Peter P. Puschner and Alan Burns. 2000. Guest editorial: A review of worst-case execution-time analysis. Real-Time Syst. 18, 2/3, 115--128. Google ScholarDigital Library
- J. Reineke. 2008. Caches in WCET analysis - predictability, competitiveness, sensitivity. In Ph.D. thesis, Saarland University.Google Scholar
- Jan Reineke and Daniel Grund. 2008. Relative competitive analysis of cache replacement policies. In Proceedings of the ACM SIGPLAN-SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES'08). ACM, New York, 51--60. Google ScholarDigital Library
- Jan Reineke and Daniel Grund. 2013. Sensitivity of cache replacement policies. ACM Trans. Embed. Comput. Syst. 12, 1s, 42:1--42:18. Google ScholarDigital Library
- Jan Reineke, Daniel Grund, Christoph Berg, and Reinhard Wilhelm. 2007. Timing predictability of cache replacement policies. Real-Time Syst. 37, 2, 99--122. Google ScholarDigital Library
- R. Sen and Y. N. Srikant. 2007. WCET estimation for executables in the presence of data caches. In Proceedings of the 7th ACM & IEEE International Conference on Embedded Software (EMSOFT'07). ACM, New York, 203--212. Google ScholarDigital Library
- Tyler Sondag and Hridesh Rajan. 2010. A more precise abstract domain for multi-level caches for tighter WCET analysis. In Proceedings of the 31st IEEE Real-Time Systems Symposium (RTSS'10). IEEE Computer Society, Los Alamitos, CA, 395--404. Google ScholarDigital Library
- Jan Staschulat and Rolf Ernst. 2007. Scalable precision cache analysis for real-time software. ACM Trans. Embed. Comput. Syst. 6, 4. Google ScholarDigital Library
- Andrew S. Tanenbaum. 2007. Modern Operating Systems 3rd Ed. Prentice-Hall. Google ScholarDigital Library
- Henrik Theiling, Christian Ferdinand, and Reinhard Wilhelm. 2000. Fast and precise WCET prediction by separated cache and path analyses. Real-Time Syst. 18, 2/3, 157--179. Google ScholarDigital Library
- Reinhard Wilhelm, Jakob Engblom, Andreas Ermedahl, Niklas Holsti, Stephan Thesing, David Whalley, Guillem Bernat, Christian Ferdinand, Reinhold Heckmann, Tulika Mitra, Frank Mueller, Isabelle Puaut, Peter Puschner, Jan Staschulat, and Per Stenström. 2008. The worst-case execution-time problem - Overview of methods and survey of tools. ACM Trans. Embed. Comput. Syst. 7, 3, 36:1--36:53. Google ScholarDigital Library
- Reinhard Wilhelm, Daniel Grund, Jan Reineke, Marc Schlickling, Markus Pister, and Christian Ferdinand. 2009. Memory hierarchies, pipelines, and buses for future architectures in time-critical embedded systems. Trans. Comput.-Aided Des. Integ. Cir. Sys. 28, 7, 966--978. Google ScholarDigital Library
Index Terms
- WCET analysis with MRU cache: Challenging LRU for predictability
Recommendations
Two Fast and High-Associativity Cache Schemes
A traditional implementation of the set-associative cache has the disadvantage of longer access cycle times than that of a direct-mapped cache. Several methods have been proposed for implementing associativity in non-traditional ways. However, most of ...
Counter-Based Cache Replacement and Bypassing Algorithms
Recent studies have shown that in highly associative caches, the performance gap between the Least Recently Used (LRU) and the theoretical optimal replacement algorithms is large, motivating the design of alternative replacement algorithms to improve ...
Shared-Cache Simulation for Multi-core System with LRU2-MRU Collaborative Cache Replacement Algorithm
SNPD '12: Proceedings of the 2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed ComputingThe L2 shared cache is an important resource for multi-core system. The cache replacement algorithm of L2 shared cache is one of the key factors in judging whether the L2 shared cache of multi-core system is efficient. In this paper, we study shared-...
Comments