ABSTRACT
Good spatial locality alleviates both the latency and bandwidth problem of memory by boosting the effect of prefetching and improving the utilization of cache. However, conventional definitions of spatial locality are inadequate for a programmer to precisely quantify the quality of a program, to identify causes of poor locality, and to estimate the potential by which spatial locality can be improved.
This paper describes a new, component-based model for spatial locality. It is based on measuring the change of reuse distances as a function of the data-block size. It divides spatial locality into components at program and behavior levels. While the base model is costly because it requires the tracking of the locality of every memory access, the overhead can be reduced by using small inputs and by extending a sampling-based tool. The paper presents the result of the analysis for a large set of benchmarks, the cost of the analysis, and the experience of a user study, in which the analysis helped to locate a data-layout problem and improve performance by 7% with a 6-line change in an application with over 2,000 lines.
- G. Ammons, T. Ball, and J. R. Larus. Exploiting hardware performance counters with flow and context sensitive profiling. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 85--96, 1997. Google ScholarDigital Library
- M. Arnold and B. G. Ryder. A framework for reducing the cost of instrumented code. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, Snowbird, Utah, June 2001. Google ScholarDigital Library
- E. Berg and E. Hagersten. Fast data-locality profiling of native execution. In Proceedings of the International Conference on Measurement and Modeling of Computer Systems, pages 169--180, 2005. Google ScholarDigital Library
- K. Beyls and E. D'Hollander. Discovery of locality-improving refactoring by reuse path analysis. In Proceedings of HPCC. Springer. Lecture Notes in Computer Science Vol. 4208, pages 220--229, 2006. Google ScholarDigital Library
- R. B. Bunt and J. M. Murphy. Measurement of locality and the behaviour of programs. The Computer Journal, 27(3):238--245, 1984. Google ScholarDigital Library
- B. Calder, C. Krintz, S. John, and T. Austin. Cache-conscious data placement. In Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII), San Jose, Oct 1998. Google ScholarDigital Library
- C. Cascaval, E. Duesterwald, P. F. Sweeney, and R. W. Wisniewski. Multiple page size modeling and optimization. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, St. Louis, MO, 2005. Google ScholarDigital Library
- T. M. Chilimbi. Efficient representations and abstractions for quantifying and exploiting data reference locality. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, Snowbird, Utah, June 2001. Google ScholarDigital Library
- C. Ding and Y. Zhong. Predicting whole-program locality with reuse distance analysis. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, San Diego, CA, June 2003. Google ScholarDigital Library
- C. Fang, S. Carr, S. Onder, and Z. Wang. Instruction based memory distance analysis and its application to optimization. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, St. Louis, MO, 2005. Google ScholarDigital Library
- H. Han and C.-W. Tseng. Exploiting locality for irregular scientific codes. IEEE Transactions on Parallel and Distributed Systems, 17(7):606--618, 2006. Google ScholarDigital Library
- M. Hirzel and T. M. Chilimbi. Bursty tracing: A framework for low-overhead temporal profiling. In Proceedings of ACM Workshop on Feedback-Directed and Dynamic Optimization, Dallas, Texas, 2001.Google Scholar
- K. Kelsey, T. Bai, and C. Ding. Fast track: a software system for speculative optimization. In Proceedings of the International Symposium on Code Generation and Optimization, 2009. Google ScholarDigital Library
- K.-H. Li. Reservoir-sampling algorithms of time complexity o(n(1+log(n/n))). ACM Transactions on Mathematical Software, 20(4):481--493, December 1994. Google ScholarDigital Library
- G. Marin and J. Mellor-Crummey. Cross architecture performance predictions for scientific applications using parameterized models. In Proceedings of the International Conference on Measurement and Modeling of Computer Systems, New York City, NY, June 2004. Google ScholarDigital Library
- G. Marin and J. Mellor-Crummey. Scalable cross-architecture predictions of memory hierarchy response for scientific applications. In Proceedings of the Symposium of the Las Alamos Computer Science Institute, Sante Fe, New Mexico, 2005.Google Scholar
- R. L. Mattson, J. Gecsei, D. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM System Journal, 9(2):78--117, 1970.Google ScholarDigital Library
- K. S. McKinley, S. Carr, and C.-W. Tseng. Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems, 18(4):424--453, July 1996. Google ScholarDigital Library
- R. C. Murphy and P. M. Kogge. On the memory access patterns of supercomputer applications: Benchmark selection and its implications. IEEE Transactions on Computers, 56(7):937--945, 2007. Google ScholarDigital Library
- N. Nethercote and J. Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 89--100, 2007. Google ScholarDigital Library
- E. Petrank and D. Rawitz. The hardness of cache conscious data placement. In Proceedings of ACM Symposium on Principles of Programming Languages, Portland, Oregon, January 2002. Google ScholarDigital Library
- M. L. Seidl and B. G. Zorn. Segregating heap objects by reference behavior and lifetime. In Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII), San Jose, Oct 1998. Google ScholarDigital Library
- X. Shen, Y. Gao, C. Ding, and R. Archambault. Lightweight reference affinity analysis. In Proceedings of the 19th ACM International Conference on Supercomputing, pages 131--140, Cambridge, MA, June 2005. Google ScholarDigital Library
- X. Shen and J. Shaw. Scalable implementation of efficient locality approximation. In J. N. Amaral, editor, Proceedings of the Workshop on Languages and Compilers for Parallel Computing, pages 202--216, 2008. Google ScholarDigital Library
- X. Shen, J. Shaw, B. Meeker, and C. Ding. Locality approximation using time. In Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 55--61, 2007. Google ScholarDigital Library
- X. Shen, Y. Zhong, and C. Ding. Regression-based multi-model prediction of data reuse signature. In Proceedings of the 4th Annual Symposium of the Las Alamos Computer Science Institute, Sante Fe, New Mexico, November 2003.Google Scholar
- A. J. Smith. On the effectiveness of set associative page mapping and its applications in main memory management. In Proceedings of the 2nd International Conference on Software Engineering, 1976. Google ScholarDigital Library
- Spec cpu benchmarks. http://www.spec.org/benchmarks.html\#cpu.Google Scholar
- M. M. Strout, L. Carter, and J. Ferrante. Compile-time composition of run-time data and iteration reorderings. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 245--257, San Diego, CA, June 2003. Google ScholarDigital Library
- J. Weinberg, M. O. McCracken, E. Strohmaier, and A. Snavely. Quantifying locality in the memory access patterns of hpc applications. In Proceedings of Supercomputing, 2005. Google ScholarDigital Library
- B. S. White, S. A. McKee, B. R. de Supinski, B. Miller, D. Quinlan, and M. Schulz. Improving the computational intensity of unstructured mesh applications. In Proceedings of the 19th ACM International Conference on Supercomputing, pages 341--350, Cambridge, MA, June 2005. Google ScholarDigital Library
- C. Zhang, C. Ding, M. Ogihara, Y. Zhong, and Y. Wu. A hierarchical model of data locality. In Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Charleston, SC, January 2006. Google ScholarDigital Library
- H. Zhang and D. Gildea. Stochastic lexicalized inversion transduction grammar for alignment. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pages 475--482, 2005. Google ScholarDigital Library
- Y. Zhong, S. G. Dropsho, X. Shen, A. Studer, and C. Ding. Miss rate prediction across program inputs and cache configurations. IEEE Transactions on Computers, 56(3):328--343, March 2007. Google ScholarDigital Library
- Y. Zhong, M. Orlovich, X. Shen, and C. Ding. Array regrouping and structure splitting using whole-program reference affinity. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2004. Google ScholarDigital Library
Index Terms
- A component model of spatial locality
Recommendations
False Sharing and Spatial Locality in Multiprocessor Caches
The performance of the data cache in shared-memory multiprocessors has been shown to be different from that in uniprocessors. In particular, cache miss rates in multiprocessors do not show the sharp drop typical of uniprocessors when the size of the ...
Run-time spatial locality detection and optimization
MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on MicroarchitectureAs the disparity between processor and main memory performance grows, the number of execution cycles spent waiting for memory accesses to complete also increases. As a result, latency hiding techniques are critical for improved application performance ...
Exploration of the Spatial Locality on Emerging Applications and the Consequences for Cache Performance
IPDPS '00: Proceedings of the 14th International Symposium on Parallel and Distributed ProcessingThe performance gap between processors and memory is increasing; making the cache hit rate paramount for performance. Studies show room for improvement, especially in data caches. The cache effectiveness is dictated by software locality; hence, the ...
Comments