skip to main content
10.1145/3310232.3310242acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiflConference Proceedingsconference-collections
research-article

Extended Memory Reuse: An Optimisation for Reducing Memory Allocations

Published:05 September 2018Publication History

ABSTRACT

In this paper we present an optimisation for reference counting based garbage collection. The optimisation aims at reducing the total number of calls to the heap manager while preserving the key benefits of reference counting, i.e. the opportunities for in-place updates as well as memory deallocation without global garbage collection. The key idea is to carefully extend the lifetime of variables so that memory deallocations followed by memory allocations of the same size can be replaced by a direct memory reuse. Such memory reuse turns out particularly useful in the context of innermost loops of compute-intensive applications. It leads to a runtime behaviour that performs pointer swaps between buffers in the same way it would be implemented manually in languages that require explicit memory management, e.g. C.

We have implemented the proposed optimisation in the context of the Single-Assignment C compiler tool chain. The paper provides an algorithmic description of our optimisation and an evaluation of its effectiveness over a collection of benchmarks including a subset of the Rodinia benchmarks and the NAS Parallel Benchmarks. We show that for several benchmarks with allocations within loops our optimisation reduces the amount of allocations by a few orders of magnitude. We also observe no negative impact on the overall memory footprint nor on the overall runtime. Instead, for some sequential executions we find mild improvement, and on GPU devices we observe speedups of up to a factor of 4x.

References

  1. David H Bailey, Eric Barszcz, John T Barton, David S Browning, Robert L Carter, Leonardo Dagum, Rod A Fatoohi, Paul O Frederickson, Thomas A Lasinski, Rob S Schreiber, et al. 1991. The NAS Parallel Benchmarks. The International Journal of Supercomputing Applications 5, 3 (1991), 63--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Robert Bernecky, Stephan Herhut, Sven-Bodo Scholz, Kai Trojahner, Clemens Grelck, and Alex Shafarenko. 2007. Index Vector Elimination: Making Index Vectors Affordable. In Implementation and Application of Functional Languages, 18th International Symposium (IFL'06), Budapest, Hungary, Revised Selected Papers (Lecture Notes in Computer Science), Zoltan Horváth, Viktória Zsók, and Andrew Butterfield (Eds.), Vol. 4449. Springer, 19--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Robert Bernecky and Sven-Bodo Scholz. 2015. Abstract Expressionism for Parallel Performance. In Proceedings of the 2nd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming. ACM, 54--59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. David C. Cann. 1989. Compilation Techniques for High-performance Applicative Computation. Ph.D. Dissertation. Fort Collins, CO, USA. AAI9007070. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. David C. Cann and Paraskevas Evripidou. 1995. Advanced array optimizations for high performance functional languages. IEEE Transactions on Parallel and Distributed Systems 6, 3 (March 1995), 229--239. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Shuai Che, M. Boyer, Jiayuan Meng, D. Tarjan, J. W. Sheaffer, Sang-Ha Lee, and K. Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In IEEE International Symposium on Workload Characterization (IISWC '09). 44--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. David L. Detlefs, Paul A. Martin, Mark Moir, and Guy L. Steele Jr. 2002. Lockfree reference counting. Distributed Computing 15, 4 (01 Dec 2002), 255--271. Google ScholarGoogle Scholar
  8. Steven M. Fitzgerald and Rodney R. Oldehoeft. 1996. Update-in-place Analysis for True Multidimensional Arrays. Sci. Program. 5, 2 (July 1996), 147--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Edinburgh Centre for Robotics. 2014. Robotarium Cluster. The cluster is part of the EPSRC Centre for Doctoral Training in Robotics and Autonomous Systems (RAS) in Edinburgh grant (EP/L016834/1) funded by The Engineering and Physical Sciences Research Council (EPSRC) (UK).Google ScholarGoogle Scholar
  10. Python Software Foundation. 2018. Python 3.7 Language Documentation. https://docs.python.org/3/c-api/index.html Online, accessed 14 August 2018.Google ScholarGoogle Scholar
  11. Clemens Grelck. 2012. Single Assignment C (SaC): High Productivity Meets High Performance. In 4th Central European Functional Programming Summer School (CEFP'11), Budapest, Hungary (Lecture Notes in Computer Science), V. Zsók, Z. Horváth, and R. Plasmeijer (Eds.), Vol. 7241. Springer, 207--278. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Clemens Grelck and Kai Trojahner. 2004. Implicit Memory Management for SaC. In Implementation and Application of Functional Languages, 16th International Workshop, IFL'04, Clemens Grelck and Frank Huch (Eds.). University of Kiel, Institute of Computer Science and Applied Mathematics, 335--348. Technical Report 0408.Google ScholarGoogle Scholar
  13. Jing Guo, Robert Bernecky, Jeyarajan Thiyagalingam, and Sven-Bodo Scholz. 2014. Polyhedral Methods for Improving Parallel Update-in-Place. In Proceedings of the 4th International Workshop on Polyhedral Compilation Techniques, Sanjay Rajopadhye and Sven Verdoolaege (Eds.). Vienna, Austria.Google ScholarGoogle Scholar
  14. Jing Guo, Jeyarajan Thiyagalingam, and Sven-Bodo Scholz. 2011. Breaking the Gpu Programming Barrier with the Auto-parallelising Sac Compiler. In 6th Workshop on Declarative Aspects of Multicore Programming (DAMP'11), Austin, USA. ACM Press, 15--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jurriaan Hage and Stefan Holdermans. 2008. Heap recycling for lazy languages. In Proceedings of the 2008 ACM SIGPLAN Symposium on Partial Evaluation and Semantics-based Program Manipulation, PEPM 2008, San Francisco, California, USA, January 7--8, 2008. 189--197. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. W. Hamilton and S. B. Jones. 1991. Compile-Time Garbage Collection by Necessity Analysis. In Functional Programming, Glasgow 1990, Simon L. Peyton Jones, Graham Hutton, and Carsten Kehler Holst (Eds.). Springer London, London, 66--70.Google ScholarGoogle Scholar
  17. Paul Hudak and Adrienne Bloss. 1985. The Aggregate Update Problem in Functional Programming Systems. In Proceedings of the 12th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL '85). ACM, New York, NY, USA, 300--314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Apple Inc. 2018. Swift Language Documentation. https://docs.swift.org/swift-book/ Online, accessed 14 August 2018.Google ScholarGoogle Scholar
  19. Thomas B. Jablin, Prakash Prabhu, James A. Jablin, Nick P.Johnson, Stephen R. Beard, and David I. August. 2011. Automatic CPU-GPU Communication Management and Optimization. SIGPLAN Not. 46, 6 (June 2011), 142--151. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Andreas Kågedal and Saumya Debray. 1997. A Practical Approach to Structure Reuse of Arrays in Single Assignment Languages. In Proceedings of the 14th International Conference on Logic Programming. MIT Press, 18--32.Google ScholarGoogle Scholar
  21. Akash Lal and G. Ramalingam. 2010. Reference Count Analysis with Shallow Aliasing. Inform. Process. Lett. 111, 2 (Dec. 2010), 57--63. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Oukseh Lee, Hongseok Yang, and Kwangkeun Yi. 2003. Inserting Safe Memory Reuse Commands into ML-Like Programs. In Static Analysis, 10th International Symposium, SAS 2003, San Diego, CA, USA, June 11-13, 2003, Proceedings. 171--188. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Frank H. McMahon. 1986. The Livermore Fortran Kernels: A computer test of the numerical performance range. Technical Report UCRL-53745. Lawrence Livermore National Lab., CA, USA.Google ScholarGoogle Scholar
  24. NVIDIA Corporation 2018. CUDA C Programming Guide (9.0.176 ed.). NVIDIA Corporation. https://docs.nvidia.com/cuda/archive/9.0/ Online, accessed 12 Aug. 2018.Google ScholarGoogle Scholar
  25. Young Park and Benjamin Goldberg. 1995. Static analysis for optimizing reference counting. Inform. Process. Lett. 55, 4 (1995), 229--234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. SaC Development Team 2016. SaC EBNF Grammar. SaC Development Team. Available online at http://www.sac-home.org/doku.php?id=docs:syntax.Google ScholarGoogle Scholar
  27. Kazuki Sakamoto and Tomohiko Furumoto. 2012. Life Before Automatic Reference Counting. Apress, Berkeley, CA, 1--29.Google ScholarGoogle Scholar
  28. Sven-Bodo Scholz. 1997. An Overview of Sc Sac -- a Functional Language for Numerical Applications. In Programming Languages and Fundamentals of Programming, Technical Report 9717, R. Berghammer and F. Simon (Eds.). Institut für Informatik und Praktische Mathematik, Universität Kiel.Google ScholarGoogle Scholar
  29. Sven-Bodo Scholz. 2003. Single Assignment C: efficient support for high-level array operations in a functional setting. Journal of Functional Programming 13, 6 (2003), 1005--1059. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Rifat Shahriyar, Stephen M. Blackburn, and Daniel Frampton. 2012. Down for the Count? Getting Reference Counting Back in the Ring. SIGPLAN Not. 47, 11 (June 2012), 73--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. H. Sundell. 2005. Wait-free reference counting and memory management. In 19th IEEE International Parallel and Distributed Processing Symposium. 10 pp.--. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Hans-Nikolai Vießmann, Sven-Bodo Scholz, Artjoms Šinkarovs, Brian Bainbridge, Brian Hamilton, and Simon Flower. 2015. Making Fortran Legacy Code More Functional. 27th Symposium on Implementation and Application of Functional Languages (IFL '15) (2015).Google ScholarGoogle Scholar

Index Terms

  1. Extended Memory Reuse: An Optimisation for Reducing Memory Allocations

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          IFL '18: Proceedings of the 30th Symposium on Implementation and Application of Functional Languages
          September 2018
          143 pages
          ISBN:9781450371438
          DOI:10.1145/3310232

          Copyright © 2018 Owner/Author

          This work is licensed under a Creative Commons Attribution International 4.0 License.

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 5 September 2018

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          Overall Acceptance Rate19of36submissions,53%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader