ABSTRACT
In this paper we present an optimisation for reference counting based garbage collection. The optimisation aims at reducing the total number of calls to the heap manager while preserving the key benefits of reference counting, i.e. the opportunities for in-place updates as well as memory deallocation without global garbage collection. The key idea is to carefully extend the lifetime of variables so that memory deallocations followed by memory allocations of the same size can be replaced by a direct memory reuse. Such memory reuse turns out particularly useful in the context of innermost loops of compute-intensive applications. It leads to a runtime behaviour that performs pointer swaps between buffers in the same way it would be implemented manually in languages that require explicit memory management, e.g. C.
We have implemented the proposed optimisation in the context of the Single-Assignment C compiler tool chain. The paper provides an algorithmic description of our optimisation and an evaluation of its effectiveness over a collection of benchmarks including a subset of the Rodinia benchmarks and the NAS Parallel Benchmarks. We show that for several benchmarks with allocations within loops our optimisation reduces the amount of allocations by a few orders of magnitude. We also observe no negative impact on the overall memory footprint nor on the overall runtime. Instead, for some sequential executions we find mild improvement, and on GPU devices we observe speedups of up to a factor of 4x.
- David H Bailey, Eric Barszcz, John T Barton, David S Browning, Robert L Carter, Leonardo Dagum, Rod A Fatoohi, Paul O Frederickson, Thomas A Lasinski, Rob S Schreiber, et al. 1991. The NAS Parallel Benchmarks. The International Journal of Supercomputing Applications 5, 3 (1991), 63--73. Google ScholarDigital Library
- Robert Bernecky, Stephan Herhut, Sven-Bodo Scholz, Kai Trojahner, Clemens Grelck, and Alex Shafarenko. 2007. Index Vector Elimination: Making Index Vectors Affordable. In Implementation and Application of Functional Languages, 18th International Symposium (IFL'06), Budapest, Hungary, Revised Selected Papers (Lecture Notes in Computer Science), Zoltan Horváth, Viktória Zsók, and Andrew Butterfield (Eds.), Vol. 4449. Springer, 19--36. Google ScholarDigital Library
- Robert Bernecky and Sven-Bodo Scholz. 2015. Abstract Expressionism for Parallel Performance. In Proceedings of the 2nd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming. ACM, 54--59. Google ScholarDigital Library
- David C. Cann. 1989. Compilation Techniques for High-performance Applicative Computation. Ph.D. Dissertation. Fort Collins, CO, USA. AAI9007070. Google ScholarDigital Library
- David C. Cann and Paraskevas Evripidou. 1995. Advanced array optimizations for high performance functional languages. IEEE Transactions on Parallel and Distributed Systems 6, 3 (March 1995), 229--239. Google ScholarDigital Library
- Shuai Che, M. Boyer, Jiayuan Meng, D. Tarjan, J. W. Sheaffer, Sang-Ha Lee, and K. Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In IEEE International Symposium on Workload Characterization (IISWC '09). 44--54. Google ScholarDigital Library
- David L. Detlefs, Paul A. Martin, Mark Moir, and Guy L. Steele Jr. 2002. Lockfree reference counting. Distributed Computing 15, 4 (01 Dec 2002), 255--271. Google Scholar
- Steven M. Fitzgerald and Rodney R. Oldehoeft. 1996. Update-in-place Analysis for True Multidimensional Arrays. Sci. Program. 5, 2 (July 1996), 147--160. Google ScholarDigital Library
- Edinburgh Centre for Robotics. 2014. Robotarium Cluster. The cluster is part of the EPSRC Centre for Doctoral Training in Robotics and Autonomous Systems (RAS) in Edinburgh grant (EP/L016834/1) funded by The Engineering and Physical Sciences Research Council (EPSRC) (UK).Google Scholar
- Python Software Foundation. 2018. Python 3.7 Language Documentation. https://docs.python.org/3/c-api/index.html Online, accessed 14 August 2018.Google Scholar
- Clemens Grelck. 2012. Single Assignment C (SaC): High Productivity Meets High Performance. In 4th Central European Functional Programming Summer School (CEFP'11), Budapest, Hungary (Lecture Notes in Computer Science), V. Zsók, Z. Horváth, and R. Plasmeijer (Eds.), Vol. 7241. Springer, 207--278. Google ScholarDigital Library
- Clemens Grelck and Kai Trojahner. 2004. Implicit Memory Management for SaC. In Implementation and Application of Functional Languages, 16th International Workshop, IFL'04, Clemens Grelck and Frank Huch (Eds.). University of Kiel, Institute of Computer Science and Applied Mathematics, 335--348. Technical Report 0408.Google Scholar
- Jing Guo, Robert Bernecky, Jeyarajan Thiyagalingam, and Sven-Bodo Scholz. 2014. Polyhedral Methods for Improving Parallel Update-in-Place. In Proceedings of the 4th International Workshop on Polyhedral Compilation Techniques, Sanjay Rajopadhye and Sven Verdoolaege (Eds.). Vienna, Austria.Google Scholar
- Jing Guo, Jeyarajan Thiyagalingam, and Sven-Bodo Scholz. 2011. Breaking the Gpu Programming Barrier with the Auto-parallelising Sac Compiler. In 6th Workshop on Declarative Aspects of Multicore Programming (DAMP'11), Austin, USA. ACM Press, 15--24. Google ScholarDigital Library
- Jurriaan Hage and Stefan Holdermans. 2008. Heap recycling for lazy languages. In Proceedings of the 2008 ACM SIGPLAN Symposium on Partial Evaluation and Semantics-based Program Manipulation, PEPM 2008, San Francisco, California, USA, January 7--8, 2008. 189--197. Google ScholarDigital Library
- G. W. Hamilton and S. B. Jones. 1991. Compile-Time Garbage Collection by Necessity Analysis. In Functional Programming, Glasgow 1990, Simon L. Peyton Jones, Graham Hutton, and Carsten Kehler Holst (Eds.). Springer London, London, 66--70.Google Scholar
- Paul Hudak and Adrienne Bloss. 1985. The Aggregate Update Problem in Functional Programming Systems. In Proceedings of the 12th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL '85). ACM, New York, NY, USA, 300--314. Google ScholarDigital Library
- Apple Inc. 2018. Swift Language Documentation. https://docs.swift.org/swift-book/ Online, accessed 14 August 2018.Google Scholar
- Thomas B. Jablin, Prakash Prabhu, James A. Jablin, Nick P.Johnson, Stephen R. Beard, and David I. August. 2011. Automatic CPU-GPU Communication Management and Optimization. SIGPLAN Not. 46, 6 (June 2011), 142--151. Google ScholarDigital Library
- Andreas Kågedal and Saumya Debray. 1997. A Practical Approach to Structure Reuse of Arrays in Single Assignment Languages. In Proceedings of the 14th International Conference on Logic Programming. MIT Press, 18--32.Google Scholar
- Akash Lal and G. Ramalingam. 2010. Reference Count Analysis with Shallow Aliasing. Inform. Process. Lett. 111, 2 (Dec. 2010), 57--63. Google ScholarDigital Library
- Oukseh Lee, Hongseok Yang, and Kwangkeun Yi. 2003. Inserting Safe Memory Reuse Commands into ML-Like Programs. In Static Analysis, 10th International Symposium, SAS 2003, San Diego, CA, USA, June 11-13, 2003, Proceedings. 171--188. Google ScholarDigital Library
- Frank H. McMahon. 1986. The Livermore Fortran Kernels: A computer test of the numerical performance range. Technical Report UCRL-53745. Lawrence Livermore National Lab., CA, USA.Google Scholar
- NVIDIA Corporation 2018. CUDA C Programming Guide (9.0.176 ed.). NVIDIA Corporation. https://docs.nvidia.com/cuda/archive/9.0/ Online, accessed 12 Aug. 2018.Google Scholar
- Young Park and Benjamin Goldberg. 1995. Static analysis for optimizing reference counting. Inform. Process. Lett. 55, 4 (1995), 229--234. Google ScholarDigital Library
- SaC Development Team 2016. SaC EBNF Grammar. SaC Development Team. Available online at http://www.sac-home.org/doku.php?id=docs:syntax.Google Scholar
- Kazuki Sakamoto and Tomohiko Furumoto. 2012. Life Before Automatic Reference Counting. Apress, Berkeley, CA, 1--29.Google Scholar
- Sven-Bodo Scholz. 1997. An Overview of Sc Sac -- a Functional Language for Numerical Applications. In Programming Languages and Fundamentals of Programming, Technical Report 9717, R. Berghammer and F. Simon (Eds.). Institut für Informatik und Praktische Mathematik, Universität Kiel.Google Scholar
- Sven-Bodo Scholz. 2003. Single Assignment C: efficient support for high-level array operations in a functional setting. Journal of Functional Programming 13, 6 (2003), 1005--1059. Google ScholarDigital Library
- Rifat Shahriyar, Stephen M. Blackburn, and Daniel Frampton. 2012. Down for the Count? Getting Reference Counting Back in the Ring. SIGPLAN Not. 47, 11 (June 2012), 73--84. Google ScholarDigital Library
- H. Sundell. 2005. Wait-free reference counting and memory management. In 19th IEEE International Parallel and Distributed Processing Symposium. 10 pp.--. Google ScholarDigital Library
- Hans-Nikolai Vießmann, Sven-Bodo Scholz, Artjoms Šinkarovs, Brian Bainbridge, Brian Hamilton, and Simon Flower. 2015. Making Fortran Legacy Code More Functional. 27th Symposium on Implementation and Application of Functional Languages (IFL '15) (2015).Google Scholar
Index Terms
- Extended Memory Reuse: An Optimisation for Reducing Memory Allocations
Recommendations
Enabling Hybrid PCM Memory System with Inherent Memory Management
RACS '16: Proceedings of the International Conference on Research in Adaptive and Convergent SystemsReplacing the traditional volatile main memory, e.g., DRAM, with a non-volatile phase change memory (PCM) has become a possible solution to reduce the energy consumption of computing systems. To further reduce the bit cost of PCM, the development trend ...
An efficient on-the-fly cycle collection
A reference-counting garbage collector cannot reclaim unreachable cyclic structures of objects. Therefore, reference-counting collectors either use a backup tracing collector infrequently, or employ a cycle collector to reclaim cyclic structures. We ...
Write-aware memory management for hybrid SLC-MLC PCM memory systems
In recent years, phase-change memory (PCM) has generated a great deal of interest because of its byte addressability and non-volatility properties. It is regarded as a good alternative storage medium that can reduce the performance gap between the main ...
Comments